Re: [GENERAL] Need help with complicated SQL statement

2007-11-19 Thread Ted Byers
Thanks Shane,

It works reasonably well.  It gets the right answer,
but I guess my data set is much larger than your test.

Please consider the appended data.

The first two SQL statements are directly comparable. 
My Left join is marginally simpler, as shown by
EXPLAIN, and runs to completion in about 3 seconds
(elapsed real time), as opposed to about 30 seconds
for the two views.  It makes a little sense, though,
in that according to EXPLAIN, the LEFT JOIN needs to
consider a dramatically smaller number of rows.  What
I find puzzling, though, is that it takes less time to
get the report for 28 stocks at a given time than it
does to get the report for 1. (Both take about 30
seconds, but for 28 stocks, it takes about 0.005
seconds less time ;-)

This is a case where LEFT JOINS appear to be much
faster than subqueries.

I appreciate all your help, but I am struggling to
figure out how best to adapt my LEFT JOINs in your
VIEWs, so that the latter benefit from the speed of
the JOINs.  The heart of my problem is to figure out
how to  use a stock_id in the WHERE clause.

One thing I am not certain of is, Is there a way to
preserve the logic of the WHERE clauses by replacing
the WHERE clause, which I use to sample the time
series at 22 days ago, 66 days ago, 132 days ago c.,
by a GROUP BY clause, grouping by stock_id?  If so,
might that, along with an additional LEFT JOIN, get me
the result I am after?

I created a stored procedure that takes an id argument
(and can usefully invoke it on any stock_id in the
database), but the problem remains as to how to
construct a record set by applying the procedure to
each id in a set of ids returned, e.g., by SELECT
stock_id FROM stocks;

Ted

=== test data =
EXPLAIN SELECT A1.stock_id,
A1.price_date,
A1.adjusted,
A2.price_date AS pd22,
100.0 * (A1.adjusted -
A2.adjusted)/A2.adjusted AS gl22pc,
A3.price_date AS pd66,
100.0 * (A1.adjusted -
A3.adjusted)/A3.adjusted AS gl66pc,
A4.price_date AS pd132,
100.0 * (A1.adjusted -
A4.adjusted)/A4.adjusted AS gl132pc,
A5.price_date AS pd264,
100.0 * (A1.adjusted -
A5.adjusted)/A5.adjusted AS gl264pc
 FROM
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1) AS A1
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 22) AS A2
 ON A1.stock_id = A2.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 66) AS A3
 ON A1.stock_id = A3.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 132) AS A4
 ON A1.stock_id = A4.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 264) AS A5
 ON A1.stock_id = A5.stock_id;
++-+-++---+-+-+--+---+-+
| id | select_type | table   | type   |
possible_keys | key | key_len | ref  | rows  |
Extra   |
++-+-++---+-+-+--+---+-+
|  1 | PRIMARY | derived2  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived3  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived4  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived5  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived6  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  6 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  5 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  4 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  3 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  2 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
++-+-++---+-+-+--+---+-+
10 rows in set (0.08 sec)

EXPLAIN SELECT * FROM stock_price_history WHERE
stock_id = 1;
+++-+---+---+-+-+---+++
| id | select_type| table   | type  |
possible_keys | key | key_len | ref  
| rows   | Extra  |
+++-+---+---+-+-+---+++
|  1 | PRIMARY| derived3  | ALL   | NULL
 | NULL| NULL| NULL  |   
494 | Using 

Re: [GENERAL] Need help with complicated SQL statement

2007-11-19 Thread Shane Ambler

Ted Byers wrote:

Thanks Shane,

It works reasonably well.  It gets the right answer,
but I guess my data set is much larger than your test.


What indexes have you got?

Using this index on the sample I sent gets the response time to about 
5ms (per stock_id) (as opposed to 900ms with these columns indexed 
separately)


CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date);



Please consider the appended data.

The first two SQL statements are directly comparable. 
My Left join is marginally simpler, as shown by

EXPLAIN, and runs to completion in about 3 seconds
(elapsed real time), as opposed to about 30 seconds
for the two views.  It makes a little sense, though,
in that according to EXPLAIN, the LEFT JOIN needs to
consider a dramatically smaller number of rows.  What
I find puzzling, though, is that it takes less time to
get the report for 28 stocks at a given time than it
does to get the report for 1. (Both take about 30
seconds, but for 28 stocks, it takes about 0.005
seconds less time ;-)

This is a case where LEFT JOINS appear to be much
faster than subqueries.

I appreciate all your help, but I am struggling to
figure out how best to adapt my LEFT JOINs in your
VIEWs, so that the latter benefit from the speed of
the JOINs.  The heart of my problem is to figure out
how to  use a stock_id in the WHERE clause.


That is where I have moved away from your select - the way you are 
joining makes it hard to adapt to where you want it to end up (more than 
one stock_id per query)


By using the view that generates the rows you want you make the query 
sent from the client so much simpler and make it easy to get any single 
or list of stock_id you want.


The second view will add little overhead and can be part of the select 
sent from the client if you wish. I separated them in to two views to 
prevent duplicating the same selects for the calculations. You can merge 
them into one view if you wish - the first view would become a subselect 
for the second view.




One thing I am not certain of is, Is there a way to
preserve the logic of the WHERE clauses by replacing
the WHERE clause, which I use to sample the time
series at 22 days ago, 66 days ago, 132 days ago c.,
by a GROUP BY clause, grouping by stock_id?  If so,
might that, along with an additional LEFT JOIN, get me
the result I am after?

I created a stored procedure that takes an id argument
(and can usefully invoke it on any stock_id in the
database), but the problem remains as to how to
construct a record set by applying the procedure to
each id in a set of ids returned, e.g., by SELECT
stock_id FROM stocks;

Ted



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Need help with complicated SQL statement

2007-11-18 Thread Shane Ambler

Ted Byers wrote:


It gave apparently correct values, but for some
reason, it insisted on returning thousands upon
thousands of identical record.  There is something
awry there, but I can't place what.  Yes, I know I
could use SELECT DISTINCT, but I worry that it may be
doing a full table scan, as opposed to the relatively
direct lookup I came up with after looking at your
statement.  I don't yet know how long it would take
because it is the slowest option I tied, and I gave up
after it had returned over 10,000 rows and still
showed no signs of finishing.  I don't understand this
as explain returned apparently much better results for
yours than it did for mine.


Now that I look at it again today I see that - you would either need to 
use SELECT DISTINCT(stock_id) in the VIEW definition or select the 
stock_id from the stock table instead of the stockprices table.


I set up a little test this time - this is the example I came up with -

CREATE DATABASE stocktest;

\c stocktest

CREATE TABLE stocks
(
  id serial PRIMARY KEY,
  description text
);


CREATE TABLE stockprices
(
   id serial PRIMARY KEY,
   stock_id integer REFERENCES stocks (id),
   stock_price numeric,
   price_date date
);

CREATE INDEX idx_stockprices_date ON stockprices (price_date);
CREATE INDEX idx_stockprices_stock_id ON stockprices (stock_id);


CREATE VIEW stock_prices_combined AS
SELECT
id AS stock_id

, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1) as one_adjusted


, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1) as one_date


, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 22) as two_adjusted


, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 22) as two_date


, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 66) as three_adjusted


, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 66) as three_date


, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 132) as four_adjusted


, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 132) as four_date


, (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 264) as five_adjusted


, (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY 
price_date DESC LIMIT 1 OFFSET 264) as five_date


FROM stocks ST;


CREATE VIEW stock_price_history AS
SELECT
stock_id,

one_date AS pd1,
one_adjusted AS current_price,

two_date AS pd22,
100.0 * (one_adjusted - two_adjusted)/two_adjusted AS gl22pc,

three_date AS pd66,
100.0 * (one_adjusted - three_adjusted)/three_adjusted AS gl66pc,

four_date AS pd132,
100.0 * (one_adjusted - four_adjusted)/four_adjusted AS gl132pc,

five_date AS pd264,
100.0 * (one_adjusted - five_adjusted)/five_adjusted AS gl264pc

FROM stock_prices_combined;


I INSERTed 500 stocks entries and 10,000 stockprices entries for each 
stock (that's 5,000,000 price rows), then from


EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
I got - Total runtime: 981.618 ms

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN 
(10,25,36,45,86,154,368,481)

I got - Total runtime: 8084.217 ms

So that's about 1 second per stock_id returned (on my old machine).


You can change that last query to be -
SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id 
FROM sometable WHERE )


Which gives you the range of stock_id's from a table that you asked about.



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Need help with complicated SQL statement

2007-11-18 Thread Shane Ambler

Shane Ambler wrote:
  I INSERTed 500 stocks entries and 10,000 stockprices entries for each

stock (that's 5,000,000 price rows), then from

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
I got - Total runtime: 981.618 ms

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN 
(10,25,36,45,86,154,368,481)

I got - Total runtime: 8084.217 ms



Actually I found a better way - after you run the example I gave you 
before -


DROP INDEX idx_stockprices_date
DROP INDEX idx_stockprices_stock_id

CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date);


with the same data (5,000,000 price rows) I then get -

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20
I got - Total runtime: 6.397 ms

EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN
(10,25,36,45,86,154,368,481)
I got - Total runtime: 36.265 ms


Which is probably the speed you want ;-)



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Need help with complicated SQL statement

2007-11-17 Thread Shane Ambler

Ted Byers wrote:

Please consider the following statement (it becomes
obvious if you remember the important thing about the
table is that it has columns for each of stock_id,
price_date, and price).

(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2
ORDER BY T2.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3
ORDER BY T3.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4
ORDER BY T4.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5
ORDER BY T5.price_date ASC LIMIT 1);

This statement works flawlessly, and is blindingly
fast relative to everything else I have tried.  But I
am stuck.


I would have these subselects as -

UNION
(SELECT * FROM stockprices WHERE
 stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264)

I would expect that to give the same result but make the query plan a 
bit simpler and quicker using less memory.



First, while this statement gets me the correct data,
I need to obtain a single record with stock_id,
current price (that obtained from the first select
statement in the union, and each of the prices
returned by the subsequent select statements as a the
current price minus the price at the previous date,
and the result divided by the price at the previous
date, expressed as a percentage.  I do not yet know
how to do this using SQL (it would be trivial if I
exported the data to Java or C++ - but it isn't clear
how to do it within SQL).


I haven't tested this but I would start with -

CREATE VIEW stock_price_combined AS
SELECT
stock_id

, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id 
ORDER BY price_date DESC LIMIT 1) as orig_price


, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id 
ORDER BY price_date DESC LIMIT 1 OFFSET 22) as price_two


, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id 
ORDER BY price_date DESC LIMIT 1 OFFSET 66) as price_three


, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id 
ORDER BY price_date DESC LIMIT 1 OFFSET 132) as price_four


, (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id 
ORDER BY price_date DESC LIMIT 1 OFFSET 264) as price_five


FROM stock_prices OT;


Then you can -

SELECT
orig_price
, (orig_price - price_two) as price_increase
, ((orig_price - price_two)/price_two) as percentile
...
...

FROM stock_price_combined

WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...)



To make things more difficult, suppose I have another
select statement that returns a set of stock_ids.  How
do I apply the SQL logic I require to only those
stocks in the set returned by a statement like SELECT
stock_id FROM someTable WHERE ...  The result of this
extension would be that I have one record for each
stock in the selected set of stocks.


SELECT * from stockprices WHERE stock_id in (SELECT stock_id FROM 
someTable WHERE ...)




If that isn't the answer you want I hope it points you in the right 
direction...



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Need help with complicated SQL statement

2007-11-17 Thread Ted Byers

--- Shane Ambler [EMAIL PROTECTED] wrote:

 Ted Byers wrote:
  Please consider the following statement (it
 becomes
  obvious if you remember the important thing about
 the
  table is that it has columns for each of stock_id,
  price_date, and price).
  
  (SELECT * FROM stockprices WHERE stock_id = 1
 ORDER BY
  price_date DESC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS
 T2
  ORDER BY T2.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS
 T3
  ORDER BY T3.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 132)
 AS T4
  ORDER BY T4.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 264)
 AS T5
  ORDER BY T5.price_date ASC LIMIT 1);
  
  This statement works flawlessly, and is blindingly
  fast relative to everything else I have tried. 
 But I
  am stuck.
 
 I would have these subselects as -
 
 UNION
 (SELECT * FROM stockprices WHERE
   stock_id = 1 ORDER BY price_date DESC LIMIT 1
 OFFSET 264)
 
 I would expect that to give the same result but make
 the query plan a 
 bit simpler and quicker using less memory.
 
It gave apparently correct values, but for some
reason, it insisted on returning thousands upon
thousands of identical record.  There is something
awry there, but I can't place what.  Yes, I know I
could use SELECT DISTINCT, but I worry that it may be
doing a full table scan, as opposed to the relatively
direct lookup I came up with after looking at your
statement.  I don't yet know how long it would take
because it is the slowest option I tied, and I gave up
after it had returned over 10,000 rows and still
showed no signs of finishing.  I don't understand this
as explain returned apparently much better results for
yours than it did for mine.

My latest is as follows:

SELECT A1.stock_id,
   A1.price_date,
   A1.adjusted,
   A2.price_date AS pd22,
   100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted
AS gl22pc,
   A3.price_date AS pd66,
   100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted
AS gl66pc,
   A4.price_date AS pd132,
   100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted
AS gl132pc,
   A5.price_date AS pd264,
   100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted
AS gl264pc
FROM
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1) AS A1
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 22) AS A2
ON A1.stock_id = A2.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 66) AS A3
ON A1.stock_id = A3.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 132) AS A4
ON A1.stock_id = A4.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 264) AS A5
ON A1.stock_id = A5.stock_id;

This still gives me the correct answer, but is faster
still than anything I came up with before.  

Now that I have the correct result for one stock, I
need to adapt it to apply to each stock individually,
in some small selection from a large number of stocks.

Thanks again

Ted

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Need help with complicated SQL statement

2007-11-16 Thread Ted Byers
Please consider the following statement (it becomes
obvious if you remember the important thing about the
table is that it has columns for each of stock_id,
price_date, and price).

(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2
ORDER BY T2.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3
ORDER BY T3.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4
ORDER BY T4.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5
ORDER BY T5.price_date ASC LIMIT 1);

This statement works flawlessly, and is blindingly
fast relative to everything else I have tried.  But I
am stuck.

First, while this statement gets me the correct data,
I need to obtain a single record with stock_id,
current price (that obtained from the first select
statement in the union, and each of the prices
returned by the subsequent select statements as a the
current price minus the price at the previous date,
and the result divided by the price at the previous
date, expressed as a percentage.  I do not yet know
how to do this using SQL (it would be trivial if I
exported the data to Java or C++ - but it isn't clear
how to do it within SQL).

To make things more difficult, suppose I have another
select statement that returns a set of stock_ids.  How
do I apply the SQL logic I require to only those
stocks in the set returned by a statement like SELECT
stock_id FROM someTable WHERE ...  The result of this
extension would be that I have one record for each
stock in the selected set of stocks.

I do NOT want to have to recompute the set of stocks
for each of the select statements in the above union
(since that would be a waste because the resulting set
of stocks would always be the same for the given
criteria).  Nor do I want to apply the SQL logic I
need for the prices to all the stocks in the database.
 There could be thousands, or even tens of thousands,
of stocks represented in the database and I'd need the
gain/loss logic only for a few dozen at any given
time!

How do I make the two extensions I require?

I expect the SQL I get to be eventually placed in a
stored procedure, which may then be used to construct
a view, but that is the easy part.

Maybe I have been staring at this for too long to see
the obvious solution, but I am exhausted and am not
seeing the next step.  If there IS an obvious next
step, please at least give me a hint.

Thanks

Ted


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/