Re: [PERFORM] Random Page Cost and Planner
Hi, Rob. I tried bumping the effective_cache_size. It made no difference. My latest attempt at forcing PostgreSQL to use the indexes involved two loops: one to loop over the stations, the other to extract the station data from the measurement table. The outer loop executes in 1.5 seconds. The inner loop does a full table scan for each record in the outer loop: FOR station IN SELECT sc.station_id, sc.taken_start, sc.taken_end FROM climate.city c, climate.station s, climate.station_category sc WHERE c.id = city_id AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= radius AND s.elevation BETWEEN elevation1 AND elevation2 AND s.applicable AND sc.station_id = s.id AND sc.category_id = category_id AND extract(YEAR FROM sc.taken_start) >= year1 AND extract(YEAR FROM sc.taken_end) <= year2 ORDER BY sc.station_id LOOP RAISE NOTICE 'B.1. % % %', station.station_id, station.taken_start, station.taken_end; FOR measure IN SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE *m.station_id = station.station_id AND m.taken BETWEEN station.taken_start AND station.taken_end AND m.category_id = category_id * GROUP BY extract(YEAR FROM m.taken) LOOP RAISE NOTICE ' B.2. % %', measure.year, measure.amount; END LOOP; END LOOP; I thought that the bold lines would have evoked index use. The values used for the inner query: NOTICE: B.1. 754 1980-08-01 2001-11-30 When I run the query manually, using constants, it executes in ~25 milliseconds: SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE m.station_id = 754 AND m.taken BETWEEN '1980-08-01'::date AND '2001-11-30'::date AND m.category_id = 7 GROUP BY extract(YEAR FROM m.taken) With 106 rows it should execute in ~2.65 seconds, which is better than the 5 seconds I get when everything is cached and a tremendous improvement over the ~85 seconds from cold. I do not understand why the below query uses a full table scan (executes in ~13 seconds): SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.measurement m WHERE * m.station_id = station.station_id AND* * m.taken BETWEEN station.taken_start AND station.taken_end AND* * m.category_id = category_id* GROUP BY extract(YEAR FROM m.taken) Moreover, what can I do to solve the problem? Thanks again! Dave
Re: [PERFORM] Performance issues when the number of records are around 10 Million
On Wed, May 12, 2010 at 1:45 AM, venu madhav wrote: > [Venu] Yes, autovacuum is running every hour. I could see in the log > messages. All the configurations for autovacuum are disabled except that it > should run for every hour. This application runs on an embedded box, so > can't change the parameters as they effect the other applications running on > it. Can you please explain what do you mean by default parameters. > autovacuum = on # enable autovacuum > subprocess? > autovacuum_naptime = 3600 # time between autovacuum runs, in > secs The default value for autovacuum_naptime is a minute. Why would you want to increase it by a factor of 60? That seems likely to result in I/O spikes, table bloat, and generally poor performance. There are dramatic performance improvements in PostgreSQL 8.3 and 8.4. Upgrading would probably help, a lot. The points already made about LIMIT are also right on target. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Function scan/Index scan to nested loop
On Tue, May 11, 2010 at 2:00 PM, Carlo Stonebanks wrote: > I am concerned that there is such a lag between all the index and function > scans start/complete times and and the nested loops starting. I have > reformatted the SLOW PLAN results below to make them easier to read. Can you > tell me if this makes any sense to you? I think you want to run EXPLAIN ANALYZE on the queries that are being executed BY mdx_core.zips_in_mile_range('75203', 15::numeric) rather than the query that calls that function. You should be able to see the same caching effect there and looking at that plan might give you a better idea what is really happening. (Note that you might need to use PREPARE and EXPLAIN EXECUTE to get the same plan the function is generating internally, rather than just EXPLAIN.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Tom. Yes, that is what happened, making the tests rather meaningless, and giving me the false impression that the indexes were being used. They were but only because of cached results. When multiple users making different queries, the performance will return to ~80s per query. I also tried Kevin's suggestion, which had no noticeable effect: effective_cache_size = 512MB That said, when using the following condition, the query is fast (1 second): extract(YEAR FROM sc.taken_start) >= 1963 AND extract(YEAR FROM sc.taken_end) <= 2009 AND "-> Index Scan using measurement_013_stc_idx on measurement_013 m (cost=0.00..511.00 rows=511 width=15) (actual time=0.018..3.601 rows=3356 loops=104)" " Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 7))" This condition makes it slow (13 seconds on first run, 8 seconds thereafter): *extract(YEAR FROM sc.taken_start) >= 1900 AND *extract(YEAR FROM sc.taken_end) <= 2009 AND " Filter: (category_id = 7)" "-> Seq Scan on measurement_013 m (cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692 rows=18118395 loops=1)" At this point, I'm tempted to write a stored procedure that iterates over each station category for all the years of each station. My guess is that the planner's estimate for the number of rows that will be returned by *extract(YEAR FROM sc.taken_start) >= 1900* is incorrect and so it chooses a full table scan for all rows. Even though the lower bound appears to be a constant value of the 1900, the average year a station started collecting data was 44 years ago (1965), and did so for an average of 21.4 years. The part I am having trouble with is convincing PG to use the index for the station ID and the date range for when the station was active. Each station has a unique ID; the data in the measurement table is ordered by measurement date then by station. Should I add a clustered index by station then by date? Any other suggestions are very much appreciated. Dave
Re: [PERFORM] Random Page Cost and Planner
On Tue, May 25, 2010 at 4:26 PM, David Jarvis wrote: > shared_buffers = 1GB > temp_buffers = 32MB > work_mem = 32MB > maintenance_work_mem = 64MB > effective_cache_size = 256MB Shouldn't effective_cache_size be significantly larger? -- Rob Wultsch wult...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
David Jarvis writes: >> It sounds as though the active portion of your database is pretty >> much cached in RAM. True? > I would not have thought so; there are seven tables, each with 39 to 43 > million rows as: [ perhaps 64 bytes per row ] > The machine has 4GB of RAM, donated to PG as follows: Well, the thing you need to be *really* wary of is setting the cost parameters to make isolated tests look good. When you repeat a particular test case multiple times, all times after the first probably are fully cached ... but if your DB doesn't actually fit in RAM, that might not be too representative of what will happen under load. So if you want to cut the xxx_page_cost settings some more, pay close attention to what happens to average response time. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Random Page Cost and Planner
Hi, Kevin. Thanks for the response. It sounds as though the active portion of your database is pretty > much cached in RAM. True? > I would not have thought so; there are seven tables, each with 39 to 43 million rows as: CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date NOT NULL, station_id integer NOT NULL, amount numeric(8,2) NOT NULL, flag character varying(1) NOT NULL DEFAULT ' '::character varying, category_id smallint NOT NULL, } The machine has 4GB of RAM, donated to PG as follows: *shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB effective_cache_size = 256MB * Everything else is at its default value. The kernel: $ cat /proc/sys/kernel/shmmax 2147483648 Two postgres processes are enjoying the (virtual) space: 2619 postgres 20 0 *1126m* 524m 520m S0 13.2 0:09.41 postgres 2668 postgres 20 0 *1124m* 302m 298m S0 7.6 0:04.35 postgres can make such plans look more attractive by cutting both > random_page_cost and seq_page_cost. Some highly cached loads > perform well with these set to equal values on the order of 0.1 to > 0.001. > I tried this: no improvement. It would tend to be better than random access to 43 million rows, at > least if you need to go to disk for many of them. > I thought that the index would take care of this? The index has been set to the unique key of: station_id, taken, and category_id (the filter for child tables). Each time I scan for data, I always provide the station identifier and its date range. The date range is obtained from another table (given the same station_id). I will be trying various other indexes. I've noticed now that sometimes the results are very quick and sometimes very slow. For the query I posted, it would be great to know what would be the best indexes to use. I have a suspicion that that's going to require trial and many errors. Dave
Re: [PERFORM] shared_buffers advice
On Tue, May 25, 2010 at 5:58 AM, Konrad Garus wrote: > 2010/5/24 Merlin Moncure : > >> *) a page fault to disk is a much bigger deal than a fault to pg cache >> vs os/ cache. > > That was my impression. That's why I did not touch our 2/16 GB setting > right away. I guess that 2 more gigabytes in OS cache is better than 2 > more (duplicated) gigabytes in PG shared_buffers. In our case 2 GB > shared_buffers appears to be enough to avoid thrashing between OS and > PG. > >> *) shared_buffers is one of the _least_ important performance settings >> in postgresql.conf >> >> Many settings, like work_mem, planner tweaks, commit settings, >> autovacuum settings > > Can you recommend any sources on these parameters, especially commit > settings and planner tweaks? > > > Thank you so much for the whole answer! Not only it addresses the > immediate question, but also many of the unasked that I had in the > back of my head. It's brief and gives a broad view over all the > performance concerns. It should be part of documentation or the first > page of performance wiki. Have you copied it from somewhere? Thank you for your nice comments. This was strictly a brain dump from yours truly. There is a fairly verbose guide on the wiki (http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server). There is a lot of good info there but it's missing a few things (from_collapse_limit for example). I would prefer to see the annotated performance oriented .conf settings to be written in terms of trade offs (too low? X too high? Y setting in order to get? Z). For example, did you know that if crank max_locks_per_transaction you also increase the duration of every query that hits pg_locks() -- well, now you do :-). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote: > On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: > > On Fri, 21 May 2010, Richard Yen wrote: > >> Any ideas why the query planner chooses a different query plan when using > >> prepared statements? > > > > This is a FAQ. Preparing a statement makes Postgres create a plan, without > > knowing the values that you will plug in, so it will not be as optimal as > > if the values were available. The whole idea is to avoid the planning cost > > each time the query is executed, but if your data is unusual it can > > result in worse plans. > > > Maybe the planner could note a prepared query parameter is on a high skew > column and build a handful of plans to choose from, or just partially > re-plan on the skewed column with each execution. Or make it easier for a > user to have a prepared statement that re-plans the query each time. Even > just a per connection parameter "SET prepared.query.cacheplan = FALSE" There was talk in this year's developers' meeting of doing this replanning you've suggested. ("Re(?)plan parameterized plans with actual parameter values" on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting, specificall). This wouldn't show up until at least 9.1, but it's something people are thinking about. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
Re: [PERFORM] Random Page Cost and Planner
David Jarvis wrote: > The value for *random_page_cost* was at 2.0; reducing it to 1.1 > had a massive performance improvement (nearly an order of > magnitude). While the results now return in 5 seconds (down from > ~85 seconds) It sounds as though the active portion of your database is pretty much cached in RAM. True? > problematic lines remain. Bumping the query's end date by a single > year causes a full table scan > How do I persuade PostgreSQL to use the indexes, regardless of > number of years between the two dates? I don't know about "regardless of the number of years" -- but you can make such plans look more attractive by cutting both random_page_cost and seq_page_cost. Some highly cached loads perform well with these set to equal values on the order of 0.1 to 0.001. > (A full table scan against 43 million rows is probably not the > best plan.) It would tend to be better than random access to 43 million rows, at least if you need to go to disk for many of them. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote: > On Fri, 21 May 2010, Richard Yen wrote: >> Any ideas why the query planner chooses a different query plan when using >> prepared statements? > > This is a FAQ. Preparing a statement makes Postgres create a plan, without > knowing the values that you will plug in, so it will not be as optimal as > if the values were available. The whole idea is to avoid the planning cost > each time the query is executed, but if your data is unusual it can > result in worse plans. > Two things I disagree with. 1. The "whole idea" is not just to avoid planning cost. It is also to easily avoid SQL injection, reduce query parse time, and to make client code cleaner and more re-usable. 2. The data does not need to be "unusual". It just needs to have a skewed distribution. Skewed is not unusual (well, it would be for a primary key :P ). Maybe the planner could note a prepared query parameter is on a high skew column and build a handful of plans to choose from, or just partially re-plan on the skewed column with each execution. Or make it easier for a user to have a prepared statement that re-plans the query each time. Even just a per connection parameter "SET prepared.query.cacheplan = FALSE" > Matthew > > -- > Existence is a convenient concept to designate all of the files that an > executable program can potentially process. -- Fortran77 standard > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which hardware setup
Sorry Jesper, I thought I had mentioned.. our dataset have 18GB. Pedro Axelrud http://mailee.me http://softa.com.br http://flavors.me/pedroaxl On Tue, May 25, 2010 at 03:21, Jesper Krogh wrote: > Option 2: >> App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x >> 146GB 15k RPM SAS (RAID1) disks >> >> > you didnt mention your dataset size, but i the second option would be > preferrable in most situations since it gives more of the os memory for disc > caching. 12 gb vs 4 gb for the host running pg >
Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
Have you read this? http://blog.endpoint.com/2008/12/why-is-my-function-slow.html 99% of the 'function is slow' problems are caused by this. Have you checked the difference between explain and prepare + explain execute? >>> Tyler Hildebrandt 05/25/10 4:59 AM >>> We're using a function that when run as a select statement outside of the function takes roughly 1.5s to complete whereas running an identical query within a function is taking around 55s to complete. We are lost as to why placing this query within a function as opposed to substituting the variables in a select statement is so drastically different. The timings posted here are from a 512MB memory virtual machine and are not of major concern on their own but we are finding the same issue in our production environment with far superior hardware. The function can be found here: http://campbell-lange.net/media/files/fn_medirota_get_staff_leave_summary.sql --- Timings for the individual components on their own is as follows: select * from fn_medirota_validate_rota_master(6); Time: 0.670 ms select to_date(EXTRACT (YEAR FROM current_date)::text, ''); Time: 0.749 ms select * from fn_medirota_people_template_generator(2, 6, date'2009-01-01', date'2009-12-31', TRUE) AS templates; Time: 68.004 ms select * from fn_medirota_people_template_generator(2, 6, date'2010-01-01', date'2010-12-31', TRUE) AS templates; Time: 1797.323 Copying the exact same for loop select statement from the query above into the psql query buffer and running them with variable substitution yields the following: Running FOR loop SElECT with variable substitution: Time: 3150.585 ms Whereas invoking the function yields: select * from fn_medirota_get_staff_leave_summary(6); Time: 57375.477 ms We have tried using explain analyse to update the query optimiser, dropped and recreated the function and have restarted both the machine and the postgres server multiple times. Any help or advice would be greatly appreciated. Kindest regards, Tyler Hildebrandt --- EXPLAIN ANALYSE VERBOSE SELECT * FROM fn_medirota_get_staff_leave_summary(6); QUERY PLAN - {FUNCTIONSCAN :startup_cost 0.00 :total_cost 260.00 :plan_rows 1000 :plan_width 85 :targetlist ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resno 1 :resname id :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 2 :resname t_full_name :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 3 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } :resno 3 :resname b_enabled :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 4 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } :resno 4 :resname t_anniversary :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 5 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5 } :resno 5 :resname n_last_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6 } :resno 6 :resname n_last_year_other :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 7 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7 } :resno 7 :resname n_this_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1
Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
On Tue, May 25, 2010 at 10:55 AM, Merlin Moncure wrote: > On Tue, May 25, 2010 at 9:41 AM, Tyler Hildebrandt > wrote: >>> I think, your problem is here: >>> >>> SELECT INTO current_user * FROM >>> fn_medirota_validate_rota_master(in_currentuser); >>> >>> >>> The planner has no knowledge about how many rows this functions returns >>> if he don't know the actual parameter. Because of this, this query >>> enforce a seq-scan. Try to rewrite that to something like: >>> >>> execute 'select * from fn_medirota_validate_rota_master(' || >>> in_currentuser' || ')' into current_user >>> >> >> Thanks for your response. This doesn't seem to solve our issue, >> unfortunately. >> >> As a side to that, we have the fn_medirota_validate_rota_master calls in a >> large amount of our other functions that are running very well. > > any chance of seeing the function source? oops! I missed it :-). looking at your function, what version of postgres? have you experimented w/return query? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
On Tue, May 25, 2010 at 9:41 AM, Tyler Hildebrandt wrote: >> I think, your problem is here: >> >> SELECT INTO current_user * FROM >> fn_medirota_validate_rota_master(in_currentuser); >> >> >> The planner has no knowledge about how many rows this functions returns >> if he don't know the actual parameter. Because of this, this query >> enforce a seq-scan. Try to rewrite that to something like: >> >> execute 'select * from fn_medirota_validate_rota_master(' || >> in_currentuser' || ')' into current_user >> > > Thanks for your response. This doesn't seem to solve our issue, > unfortunately. > > As a side to that, we have the fn_medirota_validate_rota_master calls in a > large amount of our other functions that are running very well. any chance of seeing the function source? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] tunning pgsql 7.3.7 over RHEL 4.0 32 x86 (2.6.9-5ELsmp)
On Tue, May 25, 2010 at 02:04:07PM +, Juan Pablo Sandoval Rivera wrote: > Please let me give recommendation to the confituracion... The subject line of this message said you're trying to run PostgreSQL 7.3.7. I hope that's a typo, and you really mean 8.3.7, in which case this suggestion boils down to "upgrade to 8.3.11". But if you're really trying to run a version that's several years old, the best configuration advice you can receive is to upgrade to something not totally prehistoric. There have been major performance enhancements in each release since 7.3.7, and no amount of hardware tuning will make such an old version perform comparatively well. Not to mention the much greater risk you have that an unsupported version will eat your data. -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
[PERFORM] tunning pgsql 7.3.7 over RHEL 4.0 32 x86 (2.6.9-5ELsmp)
Good day list I would appreciate some comments to the following: I have a Dell PowerEdge SC1420 server with 2 GB of RAM 1 DD 73 Gb SCSI Ulltra320 2 Xeon (4 cache) with PGSQL 7.3.7 running GNU / Linux Red Hat Enterprise 4, 0 for 32-bit (kernel 2.6.9-5Elsmp) Nahant (ES) and another server start or operate the same system and database engine, HP Proliant ML 150 G6 two Smart Array P410 Xeon 2 GB RAM, 2 DD Sata 15,000 RPM (250 GB) in RAID 1 I'm validating the operation and Execution / time difference of a process between the two is not really maquians muicha few seconds almost no time to think it is just quicker the Dell machine, it must obviously affect the technology of hard drives. shmmax is set to 500.00.000, annex Execution / parameters of both machines pg_settings consultation. Please let me give recommendation to the confituracion, if this correct or would fail or left over tune. an average of 30 users use the system, and is heavy disk usage, uan table has 8 million + another + 13 milloines, the 8 is used daily, desarfortunadame Progress can not yet migrate to 8.x, that tiempoi tiomaria a development, adjustment and testing, but it will fit with the current configuration that I mentioned. Thank you. Juan Pablo Sandoval Rivera Tecnologo Prof. en Ing. de Sistemas Linux User : 322765 msn: juan_pab...@hotmail.com yahoo : juan_pab...@rocketmail.com (juan_pablos.rm) UIN : 276125187 (ICQ) Jabber : juan_pab...@www.jabberes.org Skype : juan.pablo.sandoval.rivera APOYA A ECOSEARCH.COM - Ayuda a salvar al Planeta. 80173-settings7.3.7dell Description: Binary data 80173-settings_7.3.7hp Description: Binary data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
> I think, your problem is here: > > SELECT INTO current_user * FROM > fn_medirota_validate_rota_master(in_currentuser); > > > The planner has no knowledge about how many rows this functions returns > if he don't know the actual parameter. Because of this, this query > enforce a seq-scan. Try to rewrite that to something like: > > execute 'select * from fn_medirota_validate_rota_master(' || > in_currentuser' || ')' into current_user > Thanks for your response. This doesn't seem to solve our issue, unfortunately. As a side to that, we have the fn_medirota_validate_rota_master calls in a large amount of our other functions that are running very well. -- Tyler Hildebrandt Software Developer ty...@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 020 7631 1555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote: > Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: > > temporary tables are handled pretty much like the regular table. The > > magic happens on schema level, new schema is setup for connection, so > > that it can access its own temporary tables. > > Temporary tables also are not autovacuumed. > > And that's pretty much the most of the differences. > > Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? It does matter quite significantly in my experience. Both from an io and a cpu overhead perspective. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
In response to Tyler Hildebrandt : > We're using a function that when run as a select statement outside of the > function takes roughly 1.5s to complete whereas running an identical > query within a function is taking around 55s to complete. > > select * from fn_medirota_get_staff_leave_summary(6); > Time: 57375.477 ms I think, your problem is here: SELECT INTO current_user * FROM fn_medirota_validate_rota_master(in_currentuser); The planner has no knowledge about how many rows this functions returns if he don't know the actual parameter. Because of this, this query enforce a seq-scan. Try to rewrite that to something like: execute 'select * from fn_medirota_validate_rota_master(' || in_currentuser' || ')' into current_user *untested* HTH, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
Am 25.05.2010 11:38, schrieb Grzegorz Jaśkiewicz: WAL does the same thing to DB journaling does to the FS. Plus allows you to roll back (PITR). As for the RAM, it will be in ram as long as OS decides to keep it in RAM cache, and/or its in the shared buffers memory. Or until I commit the transaction? I have not completely disabled sync-to-disk in my setup, as there are of course situations where new data comes into the database that needs to be stored in a safe manner. Unless you have a lot of doubt about the two, I don't think it makes too much sens to setup ramdisk table space yourself. But try it, and see yourself. Make sure that you have logic in place, that would set it up, before postgresql starts up, in case you'll reboot, or something. That's what I thought about when mentioning "increased setup complexity". Simply adding a keyword like "NONPERSISTENT" to the table creation statement would be preferred... Joachim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Query timing increased from 3s to 55s when used as a function instead of select
We're using a function that when run as a select statement outside of the function takes roughly 1.5s to complete whereas running an identical query within a function is taking around 55s to complete. We are lost as to why placing this query within a function as opposed to substituting the variables in a select statement is so drastically different. The timings posted here are from a 512MB memory virtual machine and are not of major concern on their own but we are finding the same issue in our production environment with far superior hardware. The function can be found here: http://campbell-lange.net/media/files/fn_medirota_get_staff_leave_summary.sql --- Timings for the individual components on their own is as follows: select * from fn_medirota_validate_rota_master(6); Time: 0.670 ms select to_date(EXTRACT (YEAR FROM current_date)::text, ''); Time: 0.749 ms select * from fn_medirota_people_template_generator(2, 6, date'2009-01-01', date'2009-12-31', TRUE) AS templates; Time: 68.004 ms select * from fn_medirota_people_template_generator(2, 6, date'2010-01-01', date'2010-12-31', TRUE) AS templates; Time: 1797.323 Copying the exact same for loop select statement from the query above into the psql query buffer and running them with variable substitution yields the following: Running FOR loop SElECT with variable substitution: Time: 3150.585 ms Whereas invoking the function yields: select * from fn_medirota_get_staff_leave_summary(6); Time: 57375.477 ms We have tried using explain analyse to update the query optimiser, dropped and recreated the function and have restarted both the machine and the postgres server multiple times. Any help or advice would be greatly appreciated. Kindest regards, Tyler Hildebrandt --- EXPLAIN ANALYSE VERBOSE SELECT * FROM fn_medirota_get_staff_leave_summary(6); QUERY PLAN - {FUNCTIONSCAN :startup_cost 0.00 :total_cost 260.00 :plan_rows 1000 :plan_width 85 :targetlist ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } :resno 1 :resname id :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } :resno 2 :resname t_full_name :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 3 :vartype 16 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } :resno 3 :resname b_enabled :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 4 :vartype 1043 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } :resno 4 :resname t_anniversary :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 5 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 5 } :resno 5 :resname n_last_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 6 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 6 } :resno 6 :resname n_last_year_other :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 7 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 7 } :resno 7 :resname n_this_year_annual :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 8 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 8 } :resno 8 :resname n_this_year_other :ressortgroupref 0 :resorigtbl 0 :resorigcol 0
Re: [PERFORM] shared_buffers advice
2010/5/24 Merlin Moncure : > *) a page fault to disk is a much bigger deal than a fault to pg cache > vs os/ cache. That was my impression. That's why I did not touch our 2/16 GB setting right away. I guess that 2 more gigabytes in OS cache is better than 2 more (duplicated) gigabytes in PG shared_buffers. In our case 2 GB shared_buffers appears to be enough to avoid thrashing between OS and PG. > *) shared_buffers is one of the _least_ important performance settings > in postgresql.conf > > Many settings, like work_mem, planner tweaks, commit settings, > autovacuum settings Can you recommend any sources on these parameters, especially commit settings and planner tweaks? Thank you so much for the whole answer! Not only it addresses the immediate question, but also many of the unasked that I had in the back of my head. It's brief and gives a broad view over all the performance concerns. It should be part of documentation or the first page of performance wiki. Have you copied it from somewhere? -- Konrad Garus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
WAL does the same thing to DB journaling does to the FS. Plus allows you to roll back (PITR). As for the RAM, it will be in ram as long as OS decides to keep it in RAM cache, and/or its in the shared buffers memory. Unless you have a lot of doubt about the two, I don't think it makes too much sens to setup ramdisk table space yourself. But try it, and see yourself. Make sure that you have logic in place, that would set it up, before postgresql starts up, in case you'll reboot, or something. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
Am 25.05.2010 11:15, schrieb Thom Brown: 2010/5/25 Joachim Worringen: And, is there anything like RAM-only tables? I really don't care whether the staging data is lost on the rare event of a machine crash, or whether the query crashes due to lack of memory (I make sure there's enough w/o paging) - I only care about performance here. Joachim I think can create a tablespace on a ram disk, and create a table there. True, but I think this makes the database server configuration more complex (which is acceptable), and may add dependencies between the server configuration and the SQL statements for the selection of tablespace name (which would be a problem)? But I am a tablespace-novice and will look into this "workaround". thanks, Joachim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
2010/5/25 Joachim Worringen : > Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: >> >> temporary tables are handled pretty much like the regular table. The >> magic happens on schema level, new schema is setup for connection, so >> that it can access its own temporary tables. >> Temporary tables also are not autovacuumed. >> And that's pretty much the most of the differences. > > Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? > > And, is there anything like RAM-only tables? I really don't care whether the > staging data is lost on the rare event of a machine crash, or whether the > query crashes due to lack of memory (I make sure there's enough w/o paging) > - I only care about performance here. > > Joachim > I think can create a tablespace on a ram disk, and create a table there. Thom -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz: temporary tables are handled pretty much like the regular table. The magic happens on schema level, new schema is setup for connection, so that it can access its own temporary tables. Temporary tables also are not autovacuumed. And that's pretty much the most of the differences. Thanks. So, the Write-Ahead-Logging (being used or not) does not matter? And, is there anything like RAM-only tables? I really don't care whether the staging data is lost on the rare event of a machine crash, or whether the query crashes due to lack of memory (I make sure there's enough w/o paging) - I only care about performance here. Joachim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance of temporary vs. regular tables
temporary tables are handled pretty much like the regular table. The magic happens on schema level, new schema is setup for connection, so that it can access its own temporary tables. Temporary tables also are not autovacuumed. And that's pretty much the most of the differences. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] performance of temporary vs. regular tables
Greetings, in http://archives.postgresql.org/message-id/1056648218.7041.11.ca...@jester, it is stated that the performance of temporary tables is "the same as a regular table but without WAL on the table contents.". I have a datamining-type application which makes heavy use of temporary tables to stage (potentially large amounts of) data between different operations. WAL is write-ahead To effectively multi-thread this application, I (think I) need to switch from temporary to regular tables, because - the concurrent threads need to use different connections, not cursors, to effectively operate concurrently - temporary tables are not visible across connections (as they are across cursors of the same connection) Thus, I wonder how much this will affect performance. Access on the temporary table is inserting (millions of) rows once in a single transaction, potentially update them all once within a single transaction, then select on them once or more. Of course, eventually loosing the data in these tables is not a problem at all. The threads are synchronized above the SQL level. Thanks for any input on how to maximize performance for this applicaiton. Joachim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance