Re: [GENERAL] Need help with complicated SQL statement
Thanks Shane, It works reasonably well. It gets the right answer, but I guess my data set is much larger than your test. Please consider the appended data. The first two SQL statements are directly comparable. My Left join is marginally simpler, as shown by EXPLAIN, and runs to completion in about 3 seconds (elapsed real time), as opposed to about 30 seconds for the two views. It makes a little sense, though, in that according to EXPLAIN, the LEFT JOIN needs to consider a dramatically smaller number of rows. What I find puzzling, though, is that it takes less time to get the report for 28 stocks at a given time than it does to get the report for 1. (Both take about 30 seconds, but for 28 stocks, it takes about 0.005 seconds less time ;-) This is a case where LEFT JOINS appear to be much faster than subqueries. I appreciate all your help, but I am struggling to figure out how best to adapt my LEFT JOINs in your VIEWs, so that the latter benefit from the speed of the JOINs. The heart of my problem is to figure out how to use a stock_id in the WHERE clause. One thing I am not certain of is, Is there a way to preserve the logic of the WHERE clauses by replacing the WHERE clause, which I use to sample the time series at 22 days ago, 66 days ago, 132 days ago c., by a GROUP BY clause, grouping by stock_id? If so, might that, along with an additional LEFT JOIN, get me the result I am after? I created a stored procedure that takes an id argument (and can usefully invoke it on any stock_id in the database), but the problem remains as to how to construct a record set by applying the procedure to each id in a set of ids returned, e.g., by SELECT stock_id FROM stocks; Ted === test data = EXPLAIN SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; ++-+-++---+-+-+--+---+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+-++---+-+-+--+---+-+ | 1 | PRIMARY | derived2 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived3 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived4 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived5 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived6 | system | NULL | NULL| NULL| NULL | 1 | | | 6 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 5 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 4 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 3 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 2 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | ++-+-++---+-+-+--+---+-+ 10 rows in set (0.08 sec) EXPLAIN SELECT * FROM stock_price_history WHERE stock_id = 1; +++-+---+---+-+-+---+++ | id | select_type| table | type | possible_keys | key | key_len | ref | rows | Extra | +++-+---+---+-+-+---+++ | 1 | PRIMARY| derived3 | ALL | NULL | NULL| NULL| NULL | 494 | Using
Re: [GENERAL] Need help with complicated SQL statement
Ted Byers wrote: Thanks Shane, It works reasonably well. It gets the right answer, but I guess my data set is much larger than your test. What indexes have you got? Using this index on the sample I sent gets the response time to about 5ms (per stock_id) (as opposed to 900ms with these columns indexed separately) CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date); Please consider the appended data. The first two SQL statements are directly comparable. My Left join is marginally simpler, as shown by EXPLAIN, and runs to completion in about 3 seconds (elapsed real time), as opposed to about 30 seconds for the two views. It makes a little sense, though, in that according to EXPLAIN, the LEFT JOIN needs to consider a dramatically smaller number of rows. What I find puzzling, though, is that it takes less time to get the report for 28 stocks at a given time than it does to get the report for 1. (Both take about 30 seconds, but for 28 stocks, it takes about 0.005 seconds less time ;-) This is a case where LEFT JOINS appear to be much faster than subqueries. I appreciate all your help, but I am struggling to figure out how best to adapt my LEFT JOINs in your VIEWs, so that the latter benefit from the speed of the JOINs. The heart of my problem is to figure out how to use a stock_id in the WHERE clause. That is where I have moved away from your select - the way you are joining makes it hard to adapt to where you want it to end up (more than one stock_id per query) By using the view that generates the rows you want you make the query sent from the client so much simpler and make it easy to get any single or list of stock_id you want. The second view will add little overhead and can be part of the select sent from the client if you wish. I separated them in to two views to prevent duplicating the same selects for the calculations. You can merge them into one view if you wish - the first view would become a subselect for the second view. One thing I am not certain of is, Is there a way to preserve the logic of the WHERE clauses by replacing the WHERE clause, which I use to sample the time series at 22 days ago, 66 days ago, 132 days ago c., by a GROUP BY clause, grouping by stock_id? If so, might that, along with an additional LEFT JOIN, get me the result I am after? I created a stored procedure that takes an id argument (and can usefully invoke it on any stock_id in the database), but the problem remains as to how to construct a record set by applying the procedure to each id in a set of ids returned, e.g., by SELECT stock_id FROM stocks; Ted -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Need help with complicated SQL statement
Ted Byers wrote: It gave apparently correct values, but for some reason, it insisted on returning thousands upon thousands of identical record. There is something awry there, but I can't place what. Yes, I know I could use SELECT DISTINCT, but I worry that it may be doing a full table scan, as opposed to the relatively direct lookup I came up with after looking at your statement. I don't yet know how long it would take because it is the slowest option I tied, and I gave up after it had returned over 10,000 rows and still showed no signs of finishing. I don't understand this as explain returned apparently much better results for yours than it did for mine. Now that I look at it again today I see that - you would either need to use SELECT DISTINCT(stock_id) in the VIEW definition or select the stock_id from the stock table instead of the stockprices table. I set up a little test this time - this is the example I came up with - CREATE DATABASE stocktest; \c stocktest CREATE TABLE stocks ( id serial PRIMARY KEY, description text ); CREATE TABLE stockprices ( id serial PRIMARY KEY, stock_id integer REFERENCES stocks (id), stock_price numeric, price_date date ); CREATE INDEX idx_stockprices_date ON stockprices (price_date); CREATE INDEX idx_stockprices_stock_id ON stockprices (stock_id); CREATE VIEW stock_prices_combined AS SELECT id AS stock_id , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1) as one_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1) as one_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as two_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as two_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as three_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as three_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as four_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as four_date , (SELECT stock_price FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as five_adjusted , (SELECT price_date FROM stockprices WHERE stock_id = ST.id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as five_date FROM stocks ST; CREATE VIEW stock_price_history AS SELECT stock_id, one_date AS pd1, one_adjusted AS current_price, two_date AS pd22, 100.0 * (one_adjusted - two_adjusted)/two_adjusted AS gl22pc, three_date AS pd66, 100.0 * (one_adjusted - three_adjusted)/three_adjusted AS gl66pc, four_date AS pd132, 100.0 * (one_adjusted - four_adjusted)/four_adjusted AS gl132pc, five_date AS pd264, 100.0 * (one_adjusted - five_adjusted)/five_adjusted AS gl264pc FROM stock_prices_combined; I INSERTed 500 stocks entries and 10,000 stockprices entries for each stock (that's 5,000,000 price rows), then from EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 I got - Total runtime: 981.618 ms EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN (10,25,36,45,86,154,368,481) I got - Total runtime: 8084.217 ms So that's about 1 second per stock_id returned (on my old machine). You can change that last query to be - SELECT * FROM stock_price_history WHERE stock_id IN (SELECT stock_id FROM sometable WHERE ) Which gives you the range of stock_id's from a table that you asked about. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Need help with complicated SQL statement
Shane Ambler wrote: I INSERTed 500 stocks entries and 10,000 stockprices entries for each stock (that's 5,000,000 price rows), then from EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 I got - Total runtime: 981.618 ms EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN (10,25,36,45,86,154,368,481) I got - Total runtime: 8084.217 ms Actually I found a better way - after you run the example I gave you before - DROP INDEX idx_stockprices_date DROP INDEX idx_stockprices_stock_id CREATE INDEX idx_stockprices_id_date ON stockprices (stock_id,price_date); with the same data (5,000,000 price rows) I then get - EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id = 20 I got - Total runtime: 6.397 ms EXPLAIN ANALYSE SELECT * FROM stock_price_history WHERE stock_id IN (10,25,36,45,86,154,368,481) I got - Total runtime: 36.265 ms Which is probably the speed you want ;-) -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Need help with complicated SQL statement
Ted Byers wrote: Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. I would have these subselects as - UNION (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) I would expect that to give the same result but make the query plan a bit simpler and quicker using less memory. First, while this statement gets me the correct data, I need to obtain a single record with stock_id, current price (that obtained from the first select statement in the union, and each of the prices returned by the subsequent select statements as a the current price minus the price at the previous date, and the result divided by the price at the previous date, expressed as a percentage. I do not yet know how to do this using SQL (it would be trivial if I exported the data to Java or C++ - but it isn't clear how to do it within SQL). I haven't tested this but I would start with - CREATE VIEW stock_price_combined AS SELECT stock_id , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1) as orig_price , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 22) as price_two , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 66) as price_three , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 132) as price_four , (SELECT stock_price FROM stockprices WHERE stock_id = OT.stock_id ORDER BY price_date DESC LIMIT 1 OFFSET 264) as price_five FROM stock_prices OT; Then you can - SELECT orig_price , (orig_price - price_two) as price_increase , ((orig_price - price_two)/price_two) as percentile ... ... FROM stock_price_combined WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...) To make things more difficult, suppose I have another select statement that returns a set of stock_ids. How do I apply the SQL logic I require to only those stocks in the set returned by a statement like SELECT stock_id FROM someTable WHERE ... The result of this extension would be that I have one record for each stock in the selected set of stocks. SELECT * from stockprices WHERE stock_id in (SELECT stock_id FROM someTable WHERE ...) If that isn't the answer you want I hope it points you in the right direction... -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Need help with complicated SQL statement
--- Shane Ambler [EMAIL PROTECTED] wrote: Ted Byers wrote: Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. I would have these subselects as - UNION (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) I would expect that to give the same result but make the query plan a bit simpler and quicker using less memory. It gave apparently correct values, but for some reason, it insisted on returning thousands upon thousands of identical record. There is something awry there, but I can't place what. Yes, I know I could use SELECT DISTINCT, but I worry that it may be doing a full table scan, as opposed to the relatively direct lookup I came up with after looking at your statement. I don't yet know how long it would take because it is the slowest option I tied, and I gave up after it had returned over 10,000 rows and still showed no signs of finishing. I don't understand this as explain returned apparently much better results for yours than it did for mine. My latest is as follows: SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; This still gives me the correct answer, but is faster still than anything I came up with before. Now that I have the correct result for one stock, I need to adapt it to apply to each stock individually, in some small selection from a large number of stocks. Thanks again Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Need help with complicated SQL statement
Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. First, while this statement gets me the correct data, I need to obtain a single record with stock_id, current price (that obtained from the first select statement in the union, and each of the prices returned by the subsequent select statements as a the current price minus the price at the previous date, and the result divided by the price at the previous date, expressed as a percentage. I do not yet know how to do this using SQL (it would be trivial if I exported the data to Java or C++ - but it isn't clear how to do it within SQL). To make things more difficult, suppose I have another select statement that returns a set of stock_ids. How do I apply the SQL logic I require to only those stocks in the set returned by a statement like SELECT stock_id FROM someTable WHERE ... The result of this extension would be that I have one record for each stock in the selected set of stocks. I do NOT want to have to recompute the set of stocks for each of the select statements in the above union (since that would be a waste because the resulting set of stocks would always be the same for the given criteria). Nor do I want to apply the SQL logic I need for the prices to all the stocks in the database. There could be thousands, or even tens of thousands, of stocks represented in the database and I'd need the gain/loss logic only for a few dozen at any given time! How do I make the two extensions I require? I expect the SQL I get to be eventually placed in a stored procedure, which may then be used to construct a view, but that is the easy part. Maybe I have been staring at this for too long to see the obvious solution, but I am exhausted and am not seeing the next step. If there IS an obvious next step, please at least give me a hint. Thanks Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/