Re: [PERFORM] Random Page Cost and Planner
Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. 2. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). - I wanted to ensure a strong correlation between primary key and station id. 3. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. 4. Set the station-taken-category index as CLUSTER. 5. Vacuumed the new tables. 6. Dropped the old tables. 7. Set the following configuration values: - shared_buffers = 1GB - temp_buffers = 32MB - work_mem = 32MB - maintenance_work_mem = 64MB - seq_page_cost = 1.0 - random_page_cost = 2.0 - cpu_index_tuple_cost = 0.001 - effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): - Vancouver: 4.2s - Yellowknife: 1.7s - Montreal: 6.5s - Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) Dave
Re: [PERFORM] Random Page Cost and Planner
2010/5/27 David Jarvis thanga...@gmail.com: Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). I wanted to ensure a strong correlation between primary key and station id. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. Set the station-taken-category index as CLUSTER. Vacuumed the new tables. Dropped the old tables. Set the following configuration values: shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_index_tuple_cost = 0.001 effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): Vancouver: 4.2s Yellowknife: 1.7s Montreal: 6.5s Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). Ho, and I set statistics to a highter value for column category_id, table station_category (seeing the same resquest and explain analyze without date in the query will help) Dave -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it * (year1||'-01-01')::date*, but I have no constant value for it -- it is a user-supplied parameter. And then there's the year wrapping problem, too, where the ending year will differ from the starting year in certain cases. (Like querying rows between Dec 22, 1900 to Mar 22 *1901* rather than Mar 22 1900 to Dec 22 1900. The first query is the winter season and the second query is all seasons except winter.) Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). C'est une grosse table. Pres que 40 million lines; il y a sept tableau comme ca. I tried an explicit join in the past: it did not help much. But that was before everything was running this fast, so now that the system performs differently, maybe it will help? Dave
Re: [PERFORM] Random Page Cost and Planner
2010/5/27 David Jarvis thanga...@gmail.com: Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it (year1||'-01-01')::date, but I have no constant value for it -- it is a user-supplied parameter. And then there's the year wrapping problem, too, where the ending year will differ from the starting year in certain cases. (Like querying rows between Dec 22, 1900 to Mar 22 1901 rather than Mar 22 1900 to Dec 22 1900. The first query is the winter season and the second query is all seasons except winter.) Ah, I though that you had a start and an end provided (so able to put them in the query) Also, I'll consider explicit ordered join but I admit I haven't read the whole thread (in particular the table size). C'est une grosse table. Pres que 40 million lines; il y a sept tableau comme ca. I tried an explicit join in the past: it did not help much. But that was before everything was running this fast, so now that the system performs differently, maybe it will help? yes. the documentation is fine for this topic : http://www.postgresql.org/docs/8.4/interactive/explicit-joins.html Consider the parameter to explicit join order (you can set it per sql session). You know your data and know what are the tables with less results to join first. ;) Dave -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Agree with Tom on his point about avoidance of cost param adjustments to fit specific test cases. A few suggestions...as I assume you own this database... - check out pg_statio_user_tables - optimize your cache hit ratio on blocks read...different time durations... pg_stat_bgwriter (read from a script or something and snapshot) - pg_buffercache in contrib/ - /proc/meminfo on linux - find out exactly what is going on with your kernel buffer cache (size, how it is buffering) and if your controller or drive is using a read ahead cache. - might want to play around with partial indexes vs. and/or range partitioning with exclusion constraints, etc. - define I/O characteristics of the dataset - taking into account index clustering and index order on in-memory pages (i.e. re-cluster?), why need for multiple index if clustering indexes on heap? - solidify the referential integrity constraints between those tables, on paperdefine the use cases before modifying the database tables...i assume this is a dev database - linux fs mount options to explore - i.e. noatime, writeback, etc. -maybe look at prepared statements if you are running alot of similar queries from a single session? assuming web front end for your db - with say frequently queried region/category/dates for large read-only dataset with multiple join conditions? There are some good presentations on pgcon.org from PGCon 2010 that was held last week... http://www.pgcon.org/2010/schedule/events/218.en.html If you take everything into account and model it correctly (not too loose, not too tight), your solution will be reusable and will save time and hardware expenses. Regards - Bryan On Thu, May 27, 2010 at 2:43 AM, David Jarvis thanga...@gmail.com wrote: Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the old set, with statistics of 1000 on the station and taken (date) columns. 2. Inserted the data from the old hierarchy into the new set, ordered by station id then by date (same seven child tables as before: one per category). - I wanted to ensure a strong correlation between primary key and station id. 3. Added three indexes per table: (a) station id; (b) date taken; and (c) station-taken-category. 4. Set the station-taken-category index as CLUSTER. 5. Vacuumed the new tables. 6. Dropped the old tables. 7. Set the following configuration values: - shared_buffers = 1GB - temp_buffers = 32MB - work_mem = 32MB - maintenance_work_mem = 64MB - seq_page_cost = 1.0 - random_page_cost = 2.0 - cpu_index_tuple_cost = 0.001 - effective_cache_size = 512MB I ran a few more reports (no reboots, but reading vastly different data sets): - Vancouver: 4.2s - Yellowknife: 1.7s - Montreal: 6.5s - Trois-Riviers: 2.8s No full table scans. I imagine some indexes are not strictly necessary and will test to see which can be removed (my guess: the station and taken indexes). The problem was that the station ids were scattered and so PostgreSQL presumed a full table scan would be faster. Physically ordering the data by station ids triggers index use every time. Next week's hardware upgrade should halve those times -- unless anyone has further suggestions to squeeze more performance out of PG. ;-) Dave
Re: [PERFORM] Random Page Cost and Planner
Hi, Bryan. Thanks for the notes. I thought about using a prepared statement, but I cannot find any examples of using a PREPARE statement from within a function, and don't really feel like tinkering around to figure it out. Performance is at the point where the Java/PHP bridge and JasperReports are bottlenecks. The run_time variable seldom goes beyond 2.6s now. The reports take about 5 - 6 seconds to appear. At this point I'm into diminishing returns. I can perform a 60-minute hardware upgrade or spend 12 hours profiling to get less than the same net effect (and there is no guarantee I can improve the performance in fewer than 12 hours -- it took me 17 days and countless e-mails to this mailing group just to get this far -- *thank you again for all the help*, by the way). (If I was a PostgreSQL guru like most people on this list, it might take me 2 hours of profiling to optimize away the remaining bottlenecks, but even then the gain would only be a second or two in the database arena; the other system components will also gain by a hardware upgrade.) Dave
Re: [PERFORM] Random Page Cost and Planner
Hi, Rob. I tried bumping the effective_cache_size. It made no difference. My latest attempt at forcing PostgreSQL to use the indexes involved two loops: one to loop over the stations, the other to extract the station data from the measurement table. The outer loop executes in 1.5 seconds. The inner loop does a full table scan for each record in the outer loop: FOR station IN SELECT sc.station_id, sc.taken_start, sc.taken_end FROM climate.city c, climate.station s, climate.station_category sc WHERE c.id = city_id AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 = radius AND s.elevation BETWEEN elevation1 AND elevation2 AND s.applicable AND sc.station_id = s.id AND sc.category_id = category_id AND extract(YEAR FROM sc.taken_start) = year1 AND extract(YEAR FROM sc.taken_end) = year2 ORDER BY sc.station_id LOOP RAISE NOTICE 'B.1. % % %', station.station_id, station.taken_start, station.taken_end; FOR measure IN SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE *m.station_id = station.station_id AND m.taken BETWEEN station.taken_start AND station.taken_end AND m.category_id = category_id * GROUP BY extract(YEAR FROM m.taken) LOOP RAISE NOTICE ' B.2. % %', measure.year, measure.amount; END LOOP; END LOOP; I thought that the bold lines would have evoked index use. The values used for the inner query: NOTICE: B.1. 754 1980-08-01 2001-11-30 When I run the query manually, using constants, it executes in ~25 milliseconds: SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE m.station_id = 754 AND m.taken BETWEEN '1980-08-01'::date AND '2001-11-30'::date AND m.category_id = 7 GROUP BY extract(YEAR FROM m.taken) With 106 rows it should execute in ~2.65 seconds, which is better than the 5 seconds I get when everything is cached and a tremendous improvement over the ~85 seconds from cold. I do not understand why the below query uses a full table scan (executes in ~13 seconds): SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE * m.station_id = station.station_id AND* * m.taken BETWEEN station.taken_start AND station.taken_end AND* * m.category_id = category_id* GROUP BY extract(YEAR FROM m.taken) Moreover, what can I do to solve the problem? Thanks again! Dave
Re: [PERFORM] Random Page Cost and Planner
On May 26, 2010, at 6:50 AM, David Jarvis wrote: That said, when using the following condition, the query is fast (1 second): extract(YEAR FROM sc.taken_start) = 1963 AND extract(YEAR FROM sc.taken_end) = 2009 AND - Index Scan using measurement_013_stc_idx on measurement_013 m (cost=0.00..511.00 rows=511 width=15) (actual time=0.018..3.601 rows=3356 loops=104) Index Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 7)) This condition makes it slow (13 seconds on first run, 8 seconds thereafter): extract(YEAR FROM sc.taken_start) = 1900 AND extract(YEAR FROM sc.taken_end) = 2009 AND Filter: (category_id = 7) - Seq Scan on measurement_013 m (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692 rows=18118395 loops=1) At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of each station. My guess is that the planner's estimate for the number of rows that will be returned by extract(YEAR FROM sc.taken_start) = 1900 is incorrect and so it chooses a full table scan for all rows. Nope, it appears that the planner estimate is correct (it estimates 18118464 vs 18118464 real rows). I think what's happening there is that 18M rows is large enough part of the total table rows that it makes sense to scan it sequentially (eliminating random access costs). Try SET enable_seqsan = false and repeat the query - there is a chance that the index scan would be even slower. The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when the station was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date then by station. Should I add a clustered index by station then by date? Any other suggestions are very much appreciated. Is it necessary to get the data as far as 1900 all the time ? Maybe there is a possibility to aggregate results from the past years if they are constant. Regards, -- Alexey Klyukin al...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Current Folder: Sent Sign Out Compose Addresses Folders Options Autoreply Search Help CalendarG-Hosting.cz Message List | Delete | Edit Message as New Previous | Next Forward | Forward as Attachment | Reply | Reply All Subject:Re: [PERFORM] Random Page Cost and Planner From: t...@fuzzy.cz Date: Wed, May 26, 2010 12:01 pm To: David Jarvis thanga...@gmail.com Priority: Normal Options:View Full Header | View Printable Version | Download this as a file | View Message details Hi, Tom. Yes, that is what happened, making the tests rather meaningless, and giving me the false impression that the indexes were being used. They were but only because of cached results. When multiple users making different queries, the performance will return to ~80s per query. I also tried Kevin's suggestion, which had no noticeable effect: effective_cache_size = 512MB That said, when using the following condition, the query is fast (1 second): extract(YEAR FROM sc.taken_start) = 1963 AND extract(YEAR FROM sc.taken_end) = 2009 AND - Index Scan using measurement_013_stc_idx on measurement_013 m (cost=0.00..511.00 rows=511 width=15) (actual time=0.018..3.601 rows=3356 loops=104) Index Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 7)) This condition makes it slow (13 seconds on first run, 8 seconds thereafter): *extract(YEAR FROM sc.taken_start) = 1900 AND *extract(YEAR FROM sc.taken_end) = 2009 AND Filter: (category_id = 7) - Seq Scan on measurement_013 m (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692 rows=18118395 loops=1) At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of each station. My guess is that the planner's estimate for the number of rows that will be returned by *extract(YEAR FROM sc.taken_start) = 1900* is incorrect and so it chooses a full table scan for all rows. Even though the lower bound appears to be a constant value of the 1900, the average year a station started collecting data was 44 years ago (1965), and did so for an average of 21.4 years. The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when the station was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date then by station. Well, don't forget indexes may not be the best way to evaluate the query - if the selectivity is low (the query returns large portion of the table) the sequetial scan is actually faster. The problem is using index means you have to read the index blocks too, and then the table blocks, and this is actually random access. So your belief that thanks to using indexes the query will run faster could be false. And this is what happens in the queries above - the first query covers years 1963-2009, while the second one covers 1900-2009. Given the fact this table contains ~40m rows, the first query returns about 0.01% (3k rows) while the second one returns almost 50% of the data (18m rows). So I doubt this might be improved using an index ... But you can try that by setting enable_seqscan=off or proper setting of the random_page_cost / seq_page_cost variables (so that the plan with indexes is cheaper than the sequential scan). You can do that in the session (e.g. use SET enable_seqscan=off) so that you won't harm other sessions. Should I add a clustered index by station then by date? Any other suggestions are very much appreciated. Well, the only thing that crossed my mind is partitioning with properly defined constraints and constrain_exclusion=on. I'd recommend partitioning by time (each year a separate partition) but you'll have to investigate that on your own (depends on your use-cases). BTW the cache_effective_size mentioned in the previous posts is just an 'information parameter' - it does not increase the amount of memory allocated by PostgreSQL. It merely informs PostgreSQL of expected disk cache size maintained by the OS (Linux), so that PostgreSQL may estimate the change that the requested data are actually cached (and won't be read from the disk). regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
David Jarvis thanga...@gmail.com wrote: It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows The machine has 4GB of RAM In that case, modifying seq_page_cost or setting random_page_cost below something in the range of 1.5 to 2 is probably not going to be a good choice for the mix as a whole. effective_cache_size = 256MB This should probably be set to something on the order of 3GB. This will help the optimizer make more intelligent choices about when use of the index will be a win. It would tend to be better than random access to 43 million rows, at least if you need to go to disk for many of them. I thought that the index would take care of this? When the index can limit the number of rows to a fraction of the 43 million rows, using it is a win. The trick is to accurately model the relative costs of different aspects of running the query, so that when the various plans are compared, the one which looks the cheapest actually *is*. Attempting to force any particular plan through other means is risky. I will be trying various other indexes. I've noticed now that sometimes the results are very quick and sometimes very slow. For the query I posted, it would be great to know what would be the best indexes to use. I have a suspicion that that's going to require trial and many errors. Yeah, there's no substitute for testing your actual software against the actual data. Be careful, though -- as previously mentioned caching can easily distort results, particularly when you run the same query, all by itself (with no competing queries) multiple times. You'll get your best information if you can simulate a more-or-less realistic load, and try that with various settings and indexes. The cache turnover and resource contention involved in production can influence performance, and are hard to estimate any other way. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Alexey. Is it necessary to get the data as far as 1900 all the time ? Maybe there is a possibility to aggregate results from the past years if they are constant. This I have done. I created another table (station_category) that associates stations with when they started to take measurements and when they stopped (based on the data in the measurement table). For example: station_id; category_id; taken_start; taken_end 1;4;1984-07-01;1996-11-30 1;5;1984-07-01;1996-11-30 1;6;1984-07-01;1996-11-10 1;7;1984-07-01;1996-10-31 This means that station 1 has data for categories 4 through 7. The measurement table returns 3865 rows for station 1 and category 7 (this uses an index and took 7 seconds cold): station_id; taken; amount 1;1984-07-01;0.00 1;1984-07-02;0.00 1;1984-07-03;0.00 1;1984-07-04;0.00 The station_category table is basically another index. Would explicitly sorting the measurement table (273M rows) by station then by date help? Dave
Re: [PERFORM] Random Page Cost and Planner
Hi, And this is what happens in the queries above - the first query covers years 1963-2009, while the second one covers 1900-2009. Given the fact this table contains ~40m rows, the first query returns about 0.01% (3k rows) while the second one returns almost 50% of the data (18m rows). So I doubt this might be improved using an index ... I don't think that's what I'm doing. There are two tables involved: station_category (sc) and measurement (m). The first part of the query: extract(YEAR FROM sc.taken_start) = 1900 AND extract(YEAR FROM sc.taken_end) = 2009 AND That is producing a limit on the station_category table. There are, as far as I can tell, no stations that have been taking weather readings for 110 years. Most of them have a lifespan of 24 years. The above condition just makes sure that I don't get data before 1900 or after 2009. OK, I admit I'm a little bit condfused by the query, especially by these rows: sc.taken_start = '1900-01-01'::date AND sc.taken_end = '1996-12-31'::date AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND Which seems to me a little bit convoluted. Well, I think I understand what that means - give me all stations for a given city, collecting the category of data at a certain time. But I'm afraid this makes the planning much more difficult, as the select from measurements depend on the data returned by other parts of the query (rows from category). See this http://explain.depesz.com/s/H1 and this http://explain.depesz.com/s/GGx I guess the planner is confused in the second case - believes it has to read a lot more data from the measurement table, and so chooses the sequential scan. The question is if this is the right decision (I believe it is not). How many rows does the query return without the group by clause? About 14 in both cases, right? by time (each year a separate partition) but you'll have to investigate that on your own (depends on your use-cases). I cannot partition by time. First, there are 7 categories, which would mean 770 partitions if I did it by year -- 345000 rows per partition. This will grow in the future. I have heard there are troubles with having lots of child tables (too many files for the operating system). Second, the user has the ability to pick arbitrary day ranges for arbitrary year spans. There's a year wrapping issue that I won't explain because I never get it right the first time. ;-) OK, I haven't noticed the table is already partitioned by category_id and I didn't mean to partition by (taken, category_id) - that would produce a lot of partitions. Yes, that might cause problems related to number of files, but that's rather a filesystem related issue. I'd expect rather issues related to RULEs or triggers (not sure which of them you use to redirect the data into partitions). But when partitioning by time (and not by category_id) the number of partitions will be much lower and you don't have to keep all of the rules active - all you need is a rule for the current year (and maybe the next one). I'm not sure what you mean by 'year wrapping issue' but I think it might work quite well - right not the problem is PostgreSQL decides to scan the whole partition (all data for a given category_id). regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Kevin. below something in the range of 1.5 to 2 is probably not going to be a good choice for the mix as a whole. Good to know; thanks. This should probably be set to something on the order of 3GB. This will help the optimizer make more intelligent choices about when use of the index will be a win. I'll try this. times. You'll get your best information if you can simulate a more-or-less realistic load, and try that with various settings and I have no idea what a realistic load will be. The system is still in development and not open to the general public. I also don't know how much publicity the system will receive when finished. Could be a few hundred hits, could be over ten thousand. I want the system to be ready for the latter case, which means it needs to return data for many different query parameters (date span, elevation, year, radius, etc.) in under two seconds. indexes. The cache turnover and resource contention involved in production can influence performance, and are hard to estimate any other way. Another person suggested to take a look at the data. I ran a query to see if it makes sense to split the data by year. The trouble is that there are 110 years and 7 categories. The data is already filtered into child tables by category (that is logical because reporting on two different categories is nonsensical -- it is meaningless to report on snow depth *and* temperature: we already know it needs to be cold for snow). count;decade start; decade end; min date; max date 3088;1990;2000;1990-01-01;2009-12-31 2925;1980;2000;1980-01-01;2009-12-31 2752;2000;2000;2000-01-01;2009-12-31 2487;1970;1970;1970-01-01;1979-12-31 2391;1980;1990;1980-02-01;1999-12-31 2221;1980;1980;1980-01-01;1989-12-31 1934;1960;2000;1960-01-01;2009-12-31 1822;1960;1960;1960-01-01;1969-12-31 1659;1970;1980;1970-01-01;1989-12-31 1587;1960;1970;1960-01-01;1979-12-31 1524;1970;2000;1970-01-01;2009-12-31 The majority of data collected by weather stations is between 1960 and 2009, which makes sense because transistor technology would have made for (relatively) inexpensive automated monitoring stations. Or maybe there were more people and more taxes collected thus a bigger budget for weather study. Either way. ;-) The point is the top three decades (1990, 1980, 2000) have the most data, giving me a few options: - Split the seven tables twice more: before 1960 and after 1960. - Split the seven tables by decade. The first case gives 14 tables. The second case gives 102 tables (at 2.5M rows per table) as there are about 17 decades in total. This seems like a manageable number of tables as the data might eventually span 22 decades, which would be 132 tables. Even though the users will be selecting 1900 to 2009, most of the stations themselves will be within the 1960 - 2009 range, with the majority of those active between 1980 and 2009. Would splitting by decade improve the speed? Thank you very much. Dave
Re: [PERFORM] Random Page Cost and Planner
Hi, sc.taken_end = '1996-12-31'::date AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND category of data at a certain time. But I'm afraid this makes the planning much more difficult, as the select from measurements depend on the data returned by other parts of the query (rows from category). Right. Users can select 1900 - 2009. Station data hardly ever spans that range. The *station_category* is used to create a unique key into the measurement data for every station: station_id, category_id, and taken_start. The measurement data should be contiguous until taken_end. I thought that that combination would be a pointer to the exact spot in the measurement table where the data starts, which should be ridiculously fast to find. See this http://explain.depesz.com/s/H1 and this http://explain.depesz.com/s/GGx I was getting some red lines when I looked at a different plan. It's a great site. How many rows does the query return without the group by clause? About 14 in both cases, right? SELECT * FROM climate.measurement m WHERE m.station_id = 5148 AND m.taken BETWEEN '1900-08-01'::date AND '2009-12-31'::date AND m.category_id = 1 5397 rows (10 seconds cold; 0.5 seconds hot); estimated too high by 2275 rows? http://explain.depesz.com/s/uq OK, I haven't noticed the table is already partitioned by category_id and I didn't mean to partition by (taken, category_id) - that would produce a lot of partitions. Yes, that might cause problems related to number of files, but that's rather a filesystem related issue. Constrained as: CONSTRAINT measurement_013_category_id_ck CHECK (category_id = 7) I'd expect rather issues related to RULEs or triggers (not sure which of them you use to redirect the data into partitions). But when partitioning I created seven child tables of measurement. Each of these has a constraint by category_id. This makes it extremely fast to select the correct partition. I'm not sure what you mean by 'year wrapping issue' but I think it might work quite well - right not the problem is PostgreSQL decides to scan the whole partition (all data for a given category_id). I'll give it another try. :-) *Use Case #1* User selects: Mar 22 to Dec 22 User selects: 1900 to 2009 Result: Query should average *9 months* of climate data per year between Mar 22 and Dec 22 of Year. *Use Case #2* User selects: Dec 22 to Mar 22 User selects: 1900 to 2009 Result: Query should average *3 months* of climate data per year between Dec 22 of Year and Mar 22 of Year+1. So if a user selects 1950 to *1960*: - first case should average between 1950 and *1960*; and - second case should average between 1950 and *1961*. Dave
Re: [PERFORM] Random Page Cost and Planner
I was told to try OVERLAPS instead of checking years. The query is now: SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) as amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5148 AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 = 30 AND s.elevation BETWEEN 0 AND 3000 AND s.applicable = TRUE AND sc.station_id = s.id AND sc.category_id = 7 AND *(sc.taken_start, sc.taken_end) OVERLAPS ('1900-01-01'::date, '2009-12-31'::date) AND* m.station_id = s.id AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND m.category_id = sc.category_id GROUP BY extract(YEAR FROM m.taken) ORDER BY extract(YEAR FROM m.taken) 25 seconds from cold, no full table scan: http://explain.depesz.com/s/VV5 Much better than 85 seconds, but still an order of magnitude too slow. I was thinking of changing the *station_category* table to use the measurement table's primary key, instead of keying off date, as converting the dates for comparison strikes me as a bit of overhead. Also, I can get remove the / 1000 by changing the Earth's radius to kilometres (from metres), but a constant division shouldn't be significant. I really appreciate all your patience and help over the last sixteen days trying to optimize this database and these queries. Dave
Re: [PERFORM] Random Page Cost and Planner
Hi, I changed the date comparison to be based on year alone: extract(YEAR FROM sc.taken_start) = 1900 AND extract(YEAR FROM sc.taken_end) = 2009 AND The indexes are now always used; if someone wants to explain why using the numbers works (a constant) but using a date (another constant?) does not work, I'd really appreciate it. Thanks again, everybody, for your time and help. Dave
Re: [PERFORM] Random Page Cost and Planner
David Jarvis thanga...@gmail.com wrote: The value for *random_page_cost* was at 2.0; reducing it to 1.1 had a massive performance improvement (nearly an order of magnitude). While the results now return in 5 seconds (down from ~85 seconds) It sounds as though the active portion of your database is pretty much cached in RAM. True? problematic lines remain. Bumping the query's end date by a single year causes a full table scan How do I persuade PostgreSQL to use the indexes, regardless of number of years between the two dates? I don't know about regardless of the number of years -- but you can make such plans look more attractive by cutting both random_page_cost and seq_page_cost. Some highly cached loads perform well with these set to equal values on the order of 0.1 to 0.001. (A full table scan against 43 million rows is probably not the best plan.) It would tend to be better than random access to 43 million rows, at least if you need to go to disk for many of them. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Kevin. Thanks for the response. It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows as: CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date NOT NULL, station_id integer NOT NULL, amount numeric(8,2) NOT NULL, flag character varying(1) NOT NULL DEFAULT ' '::character varying, category_id smallint NOT NULL, } The machine has 4GB of RAM, donated to PG as follows: *shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB effective_cache_size = 256MB * Everything else is at its default value. The kernel: $ cat /proc/sys/kernel/shmmax 2147483648 Two postgres processes are enjoying the (virtual) space: 2619 postgres 20 0 *1126m* 524m 520m S0 13.2 0:09.41 postgres 2668 postgres 20 0 *1124m* 302m 298m S0 7.6 0:04.35 postgres can make such plans look more attractive by cutting both random_page_cost and seq_page_cost. Some highly cached loads perform well with these set to equal values on the order of 0.1 to 0.001. I tried this: no improvement. It would tend to be better than random access to 43 million rows, at least if you need to go to disk for many of them. I thought that the index would take care of this? The index has been set to the unique key of: station_id, taken, and category_id (the filter for child tables). Each time I scan for data, I always provide the station identifier and its date range. The date range is obtained from another table (given the same station_id). I will be trying various other indexes. I've noticed now that sometimes the results are very quick and sometimes very slow. For the query I posted, it would be great to know what would be the best indexes to use. I have a suspicion that that's going to require trial and many errors. Dave
Re: [PERFORM] Random Page Cost and Planner
David Jarvis thanga...@gmail.com writes: It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows as: [ perhaps 64 bytes per row ] The machine has 4GB of RAM, donated to PG as follows: Well, the thing you need to be *really* wary of is setting the cost parameters to make isolated tests look good. When you repeat a particular test case multiple times, all times after the first probably are fully cached ... but if your DB doesn't actually fit in RAM, that might not be too representative of what will happen under load. So if you want to cut the xxx_page_cost settings some more, pay close attention to what happens to average response time. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
On Tue, May 25, 2010 at 4:26 PM, David Jarvis thanga...@gmail.com wrote: shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB effective_cache_size = 256MB Shouldn't effective_cache_size be significantly larger? -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Tom. Yes, that is what happened, making the tests rather meaningless, and giving me the false impression that the indexes were being used. They were but only because of cached results. When multiple users making different queries, the performance will return to ~80s per query. I also tried Kevin's suggestion, which had no noticeable effect: effective_cache_size = 512MB That said, when using the following condition, the query is fast (1 second): extract(YEAR FROM sc.taken_start) = 1963 AND extract(YEAR FROM sc.taken_end) = 2009 AND - Index Scan using measurement_013_stc_idx on measurement_013 m (cost=0.00..511.00 rows=511 width=15) (actual time=0.018..3.601 rows=3356 loops=104) Index Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 7)) This condition makes it slow (13 seconds on first run, 8 seconds thereafter): *extract(YEAR FROM sc.taken_start) = 1900 AND *extract(YEAR FROM sc.taken_end) = 2009 AND Filter: (category_id = 7) - Seq Scan on measurement_013 m (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692 rows=18118395 loops=1) At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of each station. My guess is that the planner's estimate for the number of rows that will be returned by *extract(YEAR FROM sc.taken_start) = 1900* is incorrect and so it chooses a full table scan for all rows. Even though the lower bound appears to be a constant value of the 1900, the average year a station started collecting data was 44 years ago (1965), and did so for an average of 21.4 years. The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when the station was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date then by station. Should I add a clustered index by station then by date? Any other suggestions are very much appreciated. Dave
[PERFORM] Random Page Cost and Planner
Hi, I wrote a query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the measurement table's only index: CREATE UNIQUE INDEX measurement_001_stc_idx ON climate.measurement_001 USING btree (*station_id, taken, category_id*); The value for *random_page_cost* was at 2.0; reducing it to 1.1 had a massive performance improvement (nearly an order of magnitude). While the results now return in 5 seconds (down from ~85 seconds), problematic lines remain. Bumping the query's end date by a single year causes a full table scan: sc.taken_start = '1900-01-01'::date AND sc.taken_end = '1997-12-31'::date AND * * How do I persuade PostgreSQL to use the indexes, regardless of number of years between the two dates? (A full table scan against 43 million rows is probably not the best plan.) Find the EXPLAIN ANALYSE results below the query. Thanks again! Dave Query SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) as amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5182 AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 = 30 AND s.elevation BETWEEN 0 AND 3000 AND s.applicable = TRUE AND sc.station_id = s.id AND sc.category_id = 1 AND *sc.taken_start = '1900-01-01'::date AND sc.taken_end = '1996-12-31'::date AND *m.station_id = s.id AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND m.category_id = sc.category_id GROUP BY extract(YEAR FROM m.taken) ORDER BY extract(YEAR FROM m.taken) 1900 to 1996: Index* *Sort (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1) Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone)) Sort Method: quicksort Memory: 32kB - HashAggregate (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1) - Nested Loop (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1) Join Filter: ((m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (sc.station_id = m.station_id)) - Nested Loop (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1) Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) = 30::double precision) - Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (id = 5182) - Nested Loop (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1) - Seq Scan on station_category sc (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1) Filter: ((taken_start = '1900-01-01'::date) AND (taken_end = '1996-12-31'::date) AND (category_id = 1)) - Index Scan using station_pkey1 on station s (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458) Index Cond: (s.id = sc.station_id) Filter: (s.applicable AND (s.elevation = 0) AND (s.elevation = 3000)) - Append (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23) - Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23) Filter: (m.category_id = 1) - Bitmap Heap Scan on measurement_001 m (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23) Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 1)) - Bitmap Index Scan on measurement_001_stc_idx (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23) Index Cond: ((m.station_id = sc.station_id) AND (m.taken = sc.taken_start) AND (m.taken = sc.taken_end) AND (m.category_id = 1)) Total runtime: 2269.264 ms 1900 to 1997: Full Table Scan* *Sort (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1) Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone)) Sort Method: quicksort Memory: 32kB - HashAggregate (cost=1370181.12..1370184.62 rows=200 width=12)