Re: [GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-13 Thread Vincent de Phily
On Monday 12 September 2011 22:51:54 Reid Thompson wrote:
 test=# select distinct on (val1) val1, val2, val3 from (SELECT
 max(val3) OVER (PARTITION BY val1), * FROM sampledata) as sq where val3
 = max order by val1;

Other things I've tried (was limited to PG8.3 so no OVER (PARTITION...) 
support) :

SELECT DISTINCT ON (val1), val2, val3 FROM table ORDER BY val1, val3;

SELECT val1, val2, val3 FROM table WHERE id IN (
   SELECT sq.i FROM (
  SELECT val1, max(val3) FROM table GROUP by 1
   ) AS sq (v,i))

My case was a bit different since I wanted the record for distinct(A,B) 
instead of just DISTINC(A), and since I had a primary key available on the 
table. But let it be food for thought.



However, none of those queries are either efficient or beautiful, so I ended 
up populating a last_values table via a trigger, which is way more efficient 
if it fits your needs :

CREATE TABLE last_values (PRIMARY KEY foo, bar integer, baz integer);

CREATE OR REPLACE FUNCTION insert_last_value() RETURNS TRIGGER AS $$
BEGIN
   UPDATE last_values SET val1=NEW.val1... WHERE ...;
   IF NOT found THEN
  BEGIN
 INSERT INTO last_values (...) VALUES (NEW);
 EXCEPTION
WHEN UNIQUE_VIOLATION THEN
   UPDATE last_values SET ... WHERE ...;
 END;
END IF;
RETURN NULL;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER last_values_trigger AFTER INSERT ON values FOR EACH ROW EXECUTE 
PROCEDURE insert_last_values();

-- 
Vincent de Phily

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-13 Thread Hannes Erven
Reid,


 where any one of these 3

 11   2011-01-01
 11   2011-01-01
 13   2011-01-01

 or any one of these 2
 31   2011-01-05
 32   2011-01-05

 are suitable for val = 1, val = 3 respectively.


Can you please describe in words what you are trying to accomplish? When
I look at your data and expected output, I'd say you want this:

   For each distinct value of val1, return any value of val2 and
   the lowest value of date.


This is actually quite simple - you could also use max(), avg(), ...
instead of min for val2:

SELECT val1, min(val2), min(date)
FROM data
GROUP BY val1


Best regards

-hannes

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-13 Thread Reid Thompson
On Tue, 2011-09-13 at 16:39 +0200, Hannes Erven wrote:
 Reid,
 
 
  where any one of these 3
 
  11   2011-01-01
  11   2011-01-01
  13   2011-01-01
 
  or any one of these 2
  31   2011-01-05
  32   2011-01-05
 
  are suitable for val = 1, val = 3 respectively.
 
 
 Can you please describe in words what you are trying to accomplish? When
 I look at your data and expected output, I'd say you want this:
 
For each distinct value of val1, return any value of val2 and
the lowest value of date.

for each distinct value of val1', return the highest value(most recent)
of date and the value of val2 associated with that date




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-13 Thread Eduardo Piombino
I'm sorry Reid, driving back home I realized that the previous query I
suggested didn't do what I was expecting, cause it would compute all of val2
for each val1, even if they belonged to another group (not for a particular
val1/date pair), or in other words, to another date.

I've considered this fact in the previous post but the resulting query
appeared too complex for such a simple task, and then disregarded it, but
well, after all it seems it was necessary to do this little extra work,
because you wanted the exact val2 associated to the max(date) for val1.

I've come up with this alternative, basically a key extractor followed by
diving in the original table looking for val2 for that key.
As I've said before, the data you provided showed that there could be
multiple rows for each key made of val1 and max(date) for that specific
val1.

So, you will still have to define a criteria on which val2 to keep.
Since I don't know the reason for this query, I've suggested an array_agg so
that it is more clear to you.

select
   a.val1,
   a.date,
   array_agg(mytable.val2) -- given there can be multiple rows for any
val1/date pair, this is where you are allowed to define which one you want
(or all of them)
from (
   select val1, max(date) as date from mytable group by val1) a -- this
gives you only rows satisfying val1/max(date)
   inner join mytable on a.val1 = mytable.val1 and a.date = mytable.date --
this join allows the retrieval of val2 for that key formed by
val1/max(date), but be prepared for many rows
group by
   a.val1,
   a.date
order by
   val1;

test case:


select val1, val2, date from mytable;

1;16;2011-09-13;1
1;15;2011-09-13;2
1;14;2011-09-13;3

1;23;2011-09-12;4
1;22;2011-09-12;5
1;21;2011-09-12;6

2;6;2011-09-13;7
2;5;2011-09-13;8
2;4;2011-09-13;9

2;3;2011-09-14;10
2;2;2011-09-14;11
2;1;2011-09-14;12

This is the result coming from the new query, you see, only val2's of 14, 15
and 16 are computed (you will still eventually have to select one from it,
using a more specific aggregate, like max, min, avg, etc.)
1;2011-09-13;{14,15,16}
2;2011-09-14;{1,2,3}


This is the result coming from the original, simpler (but flawed) query,
which as it clearly shows computes val2's of 14, 15, 16, 23, 22 and 21,
beging those last 3 (23, 22 and 21) from another date associated with val1
(which is not the max date, so they shouldn't have been considered at all).

select val1, max(date), array_agg(val2) from mytable group by val1

1;2011-09-13;{14,15,16,23,22,21}
2;2011-09-14;{1,2,3,4,5,6}


hope it helps.
regards,
eduardo


On Tue, Sep 13, 2011 at 1:13 PM, Reid Thompson reid.thomp...@ateb.comwrote:

 On Tue, 2011-09-13 at 16:39 +0200, Hannes Erven wrote:
  Reid,
 
 
   where any one of these 3
  
   11   2011-01-01
   11   2011-01-01
   13   2011-01-01
  
   or any one of these 2
   31   2011-01-05
   32   2011-01-05
  
   are suitable for val = 1, val = 3 respectively.
 
 
  Can you please describe in words what you are trying to accomplish? When
  I look at your data and expected output, I'd say you want this:
 
 For each distinct value of val1, return any value of val2 and
 the lowest value of date.

 for each distinct value of val1', return the highest value(most recent)
 of date and the value of val2 associated with that date




 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-12 Thread Reid Thompson

On 9/12/2011 9:54 PM, Reid Thompson wrote:

Ack -- i flubbed the subject and sample.
The sample data should be

val  val2date
11   2011-01-01
22   2011-01-02
33   2011-01-03
41   2011-01-04
52   2011-01-05
53   2011-01-01
41   2011-01-02
62   2011-01-03
43   2011-01-04
31   2011-01-05
22   2011-01-06
43   2011-01-07
61   2011-01-08
42   2011-01-09
53   2011-01-01
21   2011-01-02
42   2011-01-03
23   2011-01-04
11   2011-01-01
22   2011-01-02
33   2011-01-03
41   2011-01-04
32   2011-01-05
13   2011-01-01
21   2011-01-02
32   2011-01-03
43   2011-01-04
54   2012-01-01

resultset:

13   2011-01-01
22   2011-01-06
31   2011-01-05
42   2011-01-09
52   2011-01-05
61   2011-01-08

where any one of these 3

11   2011-01-01
11   2011-01-01
13   2011-01-01

or any one of these 2
31   2011-01-05
32   2011-01-05

are suitable for val = 1, val = 3 respectively.




sigh -- looks like I fat fingered one of my resultset values above.
But, I think this gives me what I want:

test=# select distinct on (val1) val1, val2, val3 from (SELECT  
max(val3) OVER (PARTITION BY val1), * FROM sampledata) as sq where val3 
= max order by val1;

 val1 | val2 |val3
--+--+
 1| 3| 2011-01-01
 2| 2| 2011-01-06
 3| 2| 2011-01-05
 4| 2| 2011-01-09
 5| 4| 2012-01-01
 6| 1| 2011-01-08
(6 rows)



 val1 | val2 |val3
--+--+
 1| 3| 2011-01-01
 1| 1| 2011-01-01
 1| 1| 2011-01-01
 2| 2| 2011-01-02
 2| 1| 2011-01-02
 2| 3| 2011-01-04
 2| 1| 2011-01-02
 2| 2| 2011-01-06
 2| 2| 2011-01-02
 3| 3| 2011-01-03
 3| 3| 2011-01-03
 3| 2| 2011-01-05
 3| 1| 2011-01-05
 3| 2| 2011-01-03
 4| 1| 2011-01-04
 4| 1| 2011-01-02
 4| 3| 2011-01-04
 4| 3| 2011-01-07
 4| 3| 2011-01-04
 4| 2| 2011-01-09
 4| 1| 2011-01-04
 4| 2| 2011-01-03
 5| 4| 2012-01-01
 5| 2| 2011-01-05
 5| 3| 2011-01-01
 5| 3| 2011-01-01
 6| 1| 2011-01-08
 6| 2| 2011-01-03
(28 rows)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: Need help with what I think is likely a simple query - for each distinct val, return only one record with the most recent date.

2011-09-12 Thread Reid Thompson

Ack -- i flubbed the subject and sample.
The sample data should be

val  val2date
11   2011-01-01
22   2011-01-02
33   2011-01-03
41   2011-01-04
52   2011-01-05
53   2011-01-01
41   2011-01-02
62   2011-01-03
43   2011-01-04
31   2011-01-05
22   2011-01-06
43   2011-01-07
61   2011-01-08
42   2011-01-09
53   2011-01-01
21   2011-01-02
42   2011-01-03
23   2011-01-04
11   2011-01-01
22   2011-01-02
33   2011-01-03
41   2011-01-04
32   2011-01-05
13   2011-01-01
21   2011-01-02
32   2011-01-03
43   2011-01-04
54   2012-01-01

resultset:

13   2011-01-01
22   2011-01-06
31   2011-01-05
42   2011-01-09
52   2011-01-05
61   2011-01-08

where any one of these 3

11   2011-01-01
11   2011-01-01
13   2011-01-01

or any one of these 2
31   2011-01-05
32   2011-01-05

are suitable for val = 1, val = 3 respectively.





On 9/12/2011 8:54 PM, Reid Thompson wrote:

Could someone point me in the right direction..
Thanks - reid

Given the example data,
how do I write a query that will give me the resultset:

12011-01-01
22011-01-06
32011-01-05
42011-01-09
52011-01-05
62011-01-08

I.E. for each distinct val, return the record with the most recent date.

ex data
val  date
12011-01-01
22011-01-02
32011-01-03
42011-01-04
52011-01-05
52011-01-01
42011-01-02
62011-01-03
42011-01-04
32011-01-05
22011-01-06
42011-01-07
62011-01-08
42011-01-09
52011-01-01
22011-01-02
42011-01-03
22011-01-04
12011-01-01
22011-01-02
32011-01-03
42011-01-04
32011-01-05
12011-01-01
22011-01-02
32011-01-03
42011-01-04
52011-01-01
---

$ cat sampledata|sort -k1,2
12011-01-01
12011-01-01
12011-01-01
22011-01-02
22011-01-02
22011-01-02
22011-01-02
22011-01-04
22011-01-06
32011-01-03
32011-01-03
32011-01-03
32011-01-05
32011-01-05
42011-01-02
42011-01-03
42011-01-04
42011-01-04
42011-01-04
42011-01-04
42011-01-07
42011-01-09
52011-01-01
52011-01-01
52011-01-01
52011-01-05
62011-01-03
62011-01-08





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general