[SQL] window function to sort times series data?
Hi, I have time series data: price(id_price int, price int, created_on timestamp) I'd like to select the latest price before, say, 2010-03-10 and the latest price after that date. Using "group by" and self-joins I was able to build a (quite large :) working query. But I wonder if there is a cleaner, shorter solution with a window function. I tried something like: select * from (select first_value(p.id_price) over w as first_id_price, first_value(p.price) over w as first_price, first_value(p.created_on::date) over w as first_date, nth_value(p.id_price,2) over w as second_id_price, nth_value(p.price,2) over w as second_price, nth_value(p.created_on::date,2) over w as second_date, p.id_price from price p window w as (order by p.created_on > '2010-03-10, p.id_price desc rows between unbounded preceding and unbounded following)) as t where first_id_price=id_price; But this doesn't return correct results. Thanks for any suggestions, -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
In response to Louis-David Mitterrand : > Hi, > > I have time series data: price(id_price int, price int, created_on timestamp) > > I'd like to select the latest price before, say, 2010-03-10 and the > latest price after that date. test=*# select * from price ; id_price | price | created_on --+---+- 1 |10 | 2010-01-01 00:00:00 1 |12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 1 |15 | 2010-03-10 00:00:00 1 |13 | 2010-03-20 00:00:00 (5 rows) test=*# select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on < '2010-02-20'::date order by id_price, created_on desc ) foo union all select * from ( select distinct on(id_price) id_price, price, created_on from price where created_on > '2010-02-20'::date order by id_price, created_on asc ) bar order by id_price,created_on ; id_price | price | created_on --+---+- 1 |12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 (2 rows) That's okay for you? Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote: > In response to Louis-David Mitterrand : > > Hi, > > > > I have time series data: price(id_price int, price int, created_on > > timestamp) > > > > I'd like to select the latest price before, say, 2010-03-10 and the > > latest price after that date. > > test=*# select * from price ; > id_price | price | created_on > --+---+- > 1 |10 | 2010-01-01 00:00:00 > 1 |12 | 2010-02-01 00:00:00 > 1 | 8 | 2010-03-01 00:00:00 > 1 |15 | 2010-03-10 00:00:00 > 1 |13 | 2010-03-20 00:00:00 > (5 rows) > > test=*# select * from ( > select distinct on(id_price) id_price, price, created_on from price where > created_on < '2010-02-20'::date order by id_price, created_on desc > ) foo union all select * from ( > select distinct on(id_price) id_price, price, created_on from price where > created_on > '2010-02-20'::date order by id_price, created_on asc > ) bar order by id_price,created_on ; > id_price | price | created_on > --+---+- > 1 |12 | 2010-02-01 00:00:00 > 1 | 8 | 2010-03-01 00:00:00 > (2 rows) > > That's okay for you? Yes, that works, but I forgot in my specs (!) that I'd like the two prices (pre and post 2010-03-10) to be returned on the same row and only if a post-2010-03-10 price exists. Thanks, -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
In response to Louis-David Mitterrand : > On Wed, Mar 24, 2010 at 03:29:36PM +0100, A. Kretschmer wrote: > > In response to Louis-David Mitterrand : > > > Hi, > > > > > > I have time series data: price(id_price int, price int, created_on > > > timestamp) > > > > > > I'd like to select the latest price before, say, 2010-03-10 and the > > > latest price after that date. > > > > test=*# select * from price ; > > id_price | price | created_on > > --+---+- > > 1 |10 | 2010-01-01 00:00:00 > > 1 |12 | 2010-02-01 00:00:00 > > 1 | 8 | 2010-03-01 00:00:00 > > 1 |15 | 2010-03-10 00:00:00 > > 1 |13 | 2010-03-20 00:00:00 > > (5 rows) > > > > test=*# select * from ( > > select distinct on(id_price) id_price, price, created_on from price where > > created_on < '2010-02-20'::date order by id_price, created_on desc > > ) foo union all select * from ( > > select distinct on(id_price) id_price, price, created_on from price where > > created_on > '2010-02-20'::date order by id_price, created_on asc > > ) bar order by id_price,created_on ; > > id_price | price | created_on > > --+---+- > > 1 |12 | 2010-02-01 00:00:00 > > 1 | 8 | 2010-03-01 00:00:00 > > (2 rows) > > > > That's okay for you? > > Yes, that works, but I forgot in my specs (!) that I'd like the two > prices (pre and post 2010-03-10) to be returned on the same row and only > if a post-2010-03-10 price exists. Well: test=*# select * from price ; id_price | price | created_on --+---+- 1 |10 | 2010-01-01 00:00:00 1 |12 | 2010-02-01 00:00:00 1 | 8 | 2010-03-01 00:00:00 1 |15 | 2010-03-10 00:00:00 1 |13 | 2010-03-20 00:00:00 (5 rows) test=*# select id_price, sum(case when pre_post='pre' then price else 0 end) as pre, sum(case when pre_post='post' then price else 0 end) as post, max(case when pre_post='pre' then created_on else null end) as date_pre, max(case when pre_post='post' then created_on else null end) as date_post from (select * from (select distinct on(id_price) 'pre'::text as pre_post, id_price, price, created_on from price where created_on < '2010-02-20'::date order by id_price, created_on desc) foo union all select * from (select distinct on(id_price) 'post'::text, id_price, price, created_on from price where created_on > '2010-02-20'::date order by id_price,created_on asc) bar order by id_price, created_on) foo group by id_price; id_price | pre | post | date_pre | date_post --+-+--+-+- 1 | 12 |8 | 2010-02-01 00:00:00 | 2010-03-01 00:00:00 (1 row) You can filter that result for rows where date_post is not null. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
A. Kretschmer wrote: Well, and now i'm using 8.4 windowing-functions: test=*# select * from price order by price_id, d; price_id | price | d --+---+ 1 |10 | 2010-03-12 1 |11 | 2010-03-19 1 |12 | 2010-03-26 1 |13 | 2010-04-02 1 |14 | 2010-04-09 1 |15 | 2010-04-16 1 |16 | 2010-04-23 1 |17 | 2010-04-30 2 |20 | 2010-03-12 2 |21 | 2010-03-19 2 |22 | 2010-03-26 2 |23 | 2010-04-02 (12 Zeilen) -- now i'm searching for 2010-03-20: Zeit: 0,319 ms test=*# select price_id, sum(case when d < '2010-03-20'::date then price else 0 end) as price_old, sum(case when d > '2010-03-20'::date then price else 0 end) as price_new, max(case when d < '2010-03-20'::date then d else null end) as date_old, max(case when d > '2010-03-20'::date then d else null end) as date_new from (select price_id, price, d, lag(d) over(partition by price_id order by d), lead(d) over(partition by price_id order by d) from price) foo where '2010-03-20'::date between lag and lead group by price_id; price_id | price_old | price_new | date_old | date_new --+---+---++ 1 |11 |12 | 2010-03-19 | 2010-03-26 2 |21 |22 | 2010-03-19 | 2010-03-26 (2 Zeilen) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
On Wed, Mar 24, 2010 at 05:29:46PM +0100, Andreas Kretschmer wrote: > A. Kretschmer wrote: > > Well, and now i'm using 8.4 windowing-functions: > > test=*# select * from price order by price_id, d; > price_id | price | d > --+---+ > 1 |10 | 2010-03-12 > 1 |11 | 2010-03-19 > 1 |12 | 2010-03-26 > 1 |13 | 2010-04-02 > 1 |14 | 2010-04-09 > 1 |15 | 2010-04-16 > 1 |16 | 2010-04-23 > 1 |17 | 2010-04-30 > 2 |20 | 2010-03-12 > 2 |21 | 2010-03-19 > 2 |22 | 2010-03-26 > 2 |23 | 2010-04-02 > (12 Zeilen) > > -- now i'm searching for 2010-03-20: > > Zeit: 0,319 ms > test=*# select price_id, sum(case when d < '2010-03-20'::date then price > else 0 end) as price_old, sum(case when d > '2010-03-20'::date then > price else 0 end) as price_new, max(case when d < '2010-03-20'::date > then d else null end) as date_old, max(case when d > '2010-03-20'::date > then d else null end) as date_new from (select price_id, price, d, > lag(d) over(partition by price_id order by d), lead(d) over(partition by > price_id order by d) from price) foo where '2010-03-20'::date between > lag and lead group by price_id; > price_id | price_old | price_new | date_old | date_new > --+---+---++ > 1 |11 |12 | 2010-03-19 | 2010-03-26 > 2 |21 |22 | 2010-03-19 | 2010-03-26 > (2 Zeilen) Nice use of lag() and lead() functions. In my db id_price is a serial so it's easy to use in an aggregate to determine the latest. I also looked at window functions and did the following: select p3.price as first_price, p4.price as second_price from (select first_value(max(p.id_price)) over w as first_id_price, nth_value(max(p.id_price),2) over w as second_id_price, p.created_on > '2010-03-20' as is_new_price from price p group by p.created_on > '2010-03-20' window w as (order by p.created_on > '2010-03-20' desc rows between unbounded preceding and unbounded following) ) as t join price p3 on (t.first_id_price=p3.id_price) left join price p4 on (t.second_id_price=p4.id_price) where t.is_new_price is true test=# \e first_price | second_price -+-- 17 | 11 (1 row) Is there some potential optimizations or flaws? Here is the test database: -- -- PostgreSQL database dump -- SET statement_timeout = 0; SET client_encoding = 'SQL_ASCII'; SET standard_conforming_strings = off; SET check_function_bodies = false; SET client_min_messages = warning; SET escape_string_warning = off; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: price; Type: TABLE; Schema: public; Owner: ldm; Tablespace: -- CREATE TABLE price ( id_price integer NOT NULL, price integer, created_on timestamp without time zone ); ALTER TABLE public.price OWNER TO ldm; -- -- Name: price_id_price_seq; Type: SEQUENCE; Schema: public; Owner: ldm -- CREATE SEQUENCE price_id_price_seq START WITH 1 INCREMENT BY 1 NO MAXVALUE NO MINVALUE CACHE 1; ALTER TABLE public.price_id_price_seq OWNER TO ldm; -- -- Name: price_id_price_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: ldm -- ALTER SEQUENCE price_id_price_seq OWNED BY price.id_price; -- -- Name: price_id_price_seq; Type: SEQUENCE SET; Schema: public; Owner: ldm -- SELECT pg_catalog.setval('price_id_price_seq', 8, true); -- -- Name: id_price; Type: DEFAULT; Schema: public; Owner: ldm -- ALTER TABLE price ALTER COLUMN id_price SET DEFAULT nextval('price_id_price_seq'::regclass); -- -- Data for Name: price; Type: TABLE DATA; Schema: public; Owner: ldm -- COPY price (id_price, price, created_on) FROM stdin; 1 10 2010-03-12 00:00:00 2 11 2010-03-19 00:00:00 3 12 2010-03-26 00:00:00 4 13 2010-04-02 00:00:00 5 14 2010-04-09 00:00:00 6 15 2010-04-16 00:00:00 7 16 2010-04-23 00:00:00 8 17 2010-04-30 00:00:00 \. -- -- Name: public; Type: ACL; Schema: -; Owner: postgres -- REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE ALL ON SCHEMA public FROM postgres; GRANT ALL ON SCHEMA public TO postgres; GRANT ALL ON SCHEMA public TO PUBLIC; -- -- PostgreSQL database dump complete -- -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] window function to sort times series data?
In going through the arcana of string functions, I have come across the following series of selects that contain, for me, a mysterious "$re$". -- return all matches from regexp SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); -- test case insensitive SELECT regexp_matches('foObARbEqUEbAz', $re$(bar)(beque)$re$, 'i'); -- global option - more than one match SELECT regexp_matches('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g'); When I run this I get: regexp_matches -- I have not been able to find out what it all means. Forgive me for my blindness. John P.S. The author of the statements is "alexk" at Command Prompt. They are test statements against Postgres' string functions.
Re: [SQL] window function to sort times series data?
On Wed, Mar 24, 2010 at 2:38 PM, John Gage wrote: > In going through the arcana of string functions, I have come across the > following series of selects that contain, for me, a mysterious "$re$". > > -- return all matches from regexp > SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); The $re$ is just an arbitrary identifier for a dollar-quoted string constant. See: http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html Maybe you're on an old version which doesn't support dollar-quoted strings? On 8.3 for the above query, I get: SELECT regexp_matches('foobarbequebaz', $re$(bar)(beque)$re$); regexp_matches {bar,beque} (1 row) Josh -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] Does IMMUTABLE property propagate?
On Sat, Mar 06, 2010 at 04:31:55PM -0500, Tom Lane wrote: > Petru Ghita writes: > > "..immediately replaced with the function value" doesn't mean that the > > results of a previously evaluated function for the same parameters are > > stored and reused? > > No, it means what it says: the function is executed once and replaced > with a constant representing the result value. So for example a function like: CREATE OR REPLACE FUNCTION shorten_cruise_type(intext text) RETURNS text AS $$ declare outtext text; begin outtext = trim(regexp_replace(intext, E'\\s*Short( Break)?', '', 'i')); return outtext; end; $$ LANGUAGE plpgsql; could/should be declared immutable? Thanks, -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql