Re: [PERFORM] Are Indices automatically generated for primary keys?
On 8/18/2010 9:15 AM, Clemens Eisserer wrote: Hi, they are generated automatically. Thanks depesz! The reason why I asked was because pgAdmin doesn't display the automatically created indices, which confused me. Thanks, Clemens PGAdmin caches all database layout locally, the tree view can get very stale. So refresh the treeview with either F5 or right click an item in the treeview click refresh to rebuild the list. **snip*** All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] raid10 write performance
On 6/22/2010 4:31 AM, Grzegorz Jaśkiewicz wrote: Hi folks, is there a general problem with raid10 performance postgresql on it? We see very low performance on writes (2-3x slower than on less performant servers). I wonder if it is solely problem of raid10 configuration, or if it is postgresql's thing. RAID 10 is the commonly suggested layout for DB's as its performance to redundancy is good. The question that begs to be ask is what is the IO layout on the other servers your comparing against. Would moving WAL dir to separate disk help potentially ? Yes it can have a big impact. http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm http://wiki.postgresql.org/wiki/Performance_Analysis_Tools http://wiki.postgresql.org/wiki/Performance_Optimization We're running centos 5.4, and server config is: x3550 M2, xeon 4c e5530 2.4ghz , 6GB of ram disks: ibm 300gb 2.5 SAS raid: serveRAID M5014 SAS/SATA controller strip size is the default 128k thanks. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. attachment: justin.vcf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes
Message from Corin wakath...@gmail.com at 03-19-2010 01:26:35 PM -- ***snip The intention of the query is to find rows with no partner row. The offset and limit are just to ignore the time needed to send the result to the client. --- I don't understand the point of OFFSET, limit will accomplish the same thing, PG will still execute the query the only difference is PG will skip the step to count through the first million rows before returning a record. --- SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 100 LIMIT 1 Mysql uses this query plan: 1 PRIMARY f1 index NULL user_ref 8 NULL 2818860 Using where; Using index 2 DEPENDENT SUBQUERY f2 ref user_ref user_ref 8 f1.ref_id,f1.user_id 1 Using index Time: 9.8s --- if that's a query explain in Mysql its worthless. The above has no information, does not tell us how long each step is taking, let alone what it was thinking it would take to make the query work . -- Postgre uses this query plan: Limit (cost=66681.50..66681.50 rows=1 width=139) (actual time=7413.489..7413.489 rows=1 loops=1) - Merge Anti Join (cost=40520.17..66681.50 rows=367793 width=139) (actual time=3705.078..7344.256 rows=101 loops=1) *Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id = f2.user_id))* - Index Scan using user_ref on friends f1 (cost=0.00..26097.86 rows=2818347 width=139) (actual time=0.093..1222.592 rows=1917360 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3704.977..5043.347 rows=1990148 loops=1) - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3704.970..4710.703 rows=1990148 loops=1) Sort Key: f2.ref_id, f2.user_id Sort Method: external merge Disk: 49576kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1) Total runtime: 7422.516 ms --- We can see each step PG takes and make inform decisions what part of the query is slow . We can See the Sorting the rows takes most of the time --- It's already faster, which is great, but I wonder why the query plan is that complex. Its not complex it showing you all the steps which Mysql is not showing you I read in the pqsql docs that using a multicolumn key is almost never needed and only a waste of cpu/space. So I dropped the multicolumn key and added to separate keys instead: Where is that at??? I don't recall reading that. PG will only use indexes that match exactly where/join conditions. CREATE INDEX ref1 ON friends USING btree (ref_id); CREATE INDEX user1 ON friends USING btree (user_id); New query plan: Limit (cost=70345.04..70345.04 rows=1 width=139) (actual time=43541.709..43541.709 rows=1 loops=1) - Merge Anti Join (cost=40520.27..70345.04 rows=367793 width=139) (actual time=3356.694..43467.818 rows=101 loops=1) * Merge Cond: (f1.user_id = f2.ref_id) Join Filter: (f1.ref_id = f2.user_id) --- *take note the merge has changed. it now joins on f1.user_id=f2.ref_id then filters the results down by using the AND condition. Put the index back * --- *- Index Scan using user1 on friends f1 (cost=0.00..26059.79 rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3356.615..14941.405* rows=130503729* loops=1) --- take note look at what happened here. this because the of Join is not limited as it was before. did you run this query against Mysql with the same kind of indexes??? - - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3356.611..4127.435 rows=1990160 loops=1) Sort Key: f2.ref_id Sort Method: external merge Disk: 49560kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1) Total runtime: 43550.187 ms I also wonder why it makes a difference when adding a LIMIT clause to the subselect in an EXISTS subselect. Shouldn't pgsql always stop after finding the a row? In mysql is makes no difference in speed, pgsql even get's slower when adding a LIMIT to the EXISTS subselect (I hoped it would get faster?!). Limits occur last after doing all the major work is done SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET 100 LIMIT 1 Limit (cost=6389166.19..6389172.58 rows=1 width=139) (actual time=54540.356..54540.356 rows=1 loops=1) - Seq Scan on friends f1 (cost=0.00..9003446.87 rows=1409174 width=139) (actual time=0.511..54460.006 rows=101 loops=1) Filter:
Re: [PERFORM] How exactly PostgreSQL allocates memory for its needs?
On 2/10/2010 12:10 AM, Anton Maksimenkov wrote: Can anybody briefly explain me how each postgres process allocate memory for it needs? I mean, what is the biggest size of malloc() it may want? How many such chunks? What is the average size of allocations? I think that at first it allocates contiguous piece of shared memory for shared buffers (rather big, hundreds of megabytes usually, by one chunk). What next? temp_buffers, work_mem, maintenance_work_mem - are they allocated as contiguous too? What about other needs? By what size they are typically allocated? There is no short answer to this, you should read section 18 of the manual http://www.postgresql.org/docs/8.4/interactive/runtime-config.html specifically section 18.4 http://www.postgresql.org/docs/8.4/interactive/runtime-config-resource.html and performance section of the wiki http://wiki.postgresql.org/wiki/Performance_Optimization Here is a link annotated postgresql.conf http://www.pgcon.org/2008/schedule/attachments/44_annotated_gucs_draft1.pdf Keep in mind each connection/client that connecting to the server creates a new process on the server. Each one the settings you list above is the max amount of memory each one of those sessions is allowed to consume. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] perf problem with huge table
On 2/10/2010 5:13 PM, rama wrote: in that way, when i need to do a query for a long ranges (ie: 1 year) i just take the rows that are contained to contab_y if i need to got a query for a couple of days, i can go on ymd, if i need to get some data for the other timeframe, i can do some cool intersection between the different table using some huge (but fast) queries. Now, the matter is that this design is hard to mantain, and the tables are difficult to check what i have try is to go for a normal approach, using just a table that contains all the data, and some proper indexing. The issue is that this table can contains easilly 100M rows :) that's why the other guys do all this work to speed-up queryes splitting data on different table and precalculating the sums. I am here to ask for an advice to PGsql experts: what do you think i can do to better manage this situation? there are some other cases where i can take a look at? maybe some documentation, or some technique that i don't know? any advice is really appreciated! Look into table partitioning http://www.postgresql.org/docs/current/static/ddl-partitioning.html Its similar to what you are doing but it simplifies queries and logic to access large data sets. All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored. CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately. Thank you. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] same query in high number of times
With out knowing how much memory for each of those settings and how much work_mem for each connection its kinda hard to tell what is going. Also need version for PG, OS, how big the tables are, Also would be nice to see the query itself with explain and analyze PG does not cache the results from a query but the tables itself. The table could be completely cached but there may be some nasty Nested loops causing the problem. What are you expecting the query time to be?? check out http://wiki.postgresql.org/wiki/Performance_Optimization there is allot of info on how to tune, and diagnose problem queries Message from mailto:peter.alb...@gmail.com Peter Alban peter.alb...@gmail.com at 06-21-2009 12:54:40 PM -- Hey folks ! Still kind of analyzing the situation , I realized that I do have a reasonably high shared_memory and effective_cache_size , though if the same query is being run in a number of times ~100-200 concurrent connection it is not being cached . Should PG realize that if the table data is same should the query result set also be the same ? Instead each query takes up to 1-2 seconds . Where do I see what the PG does ? I can see now the query's that take long time ,but do not have information about what the optimizer does neither when the DB decides about to table scan or cache ? cheers, Peter
Re: [PERFORM] same query in high number of times
Peter Alban wrote: duration: 2533.734 ms statement: SELECT news.url_text,news.title, http://comments.name comments.name, comments.createdate, comments.user_id, comments.comment FROM news, comments WHERE comments.cid=http://news.id news.id AND comments.published='1' GROUP BY news.url_text,news.title http://comments.name comments.name, comments.createdate, comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT 3 And here is the query plan : QUERY PLAN Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1) - Sort (cost=4313.54..4347.26 rows=13486 width=595) (actual time=288.523..288.523 rows=3 loops=1) Sort Key: comments.createdate - HashAggregate (cost=3253.60..3388.46 rows=13486 width=595) (actual time=137.521..148.132 rows=13415 loops=1) - Hash Join (cost=1400.73..3051.31 rows=13486 width=595) (actual time=14.298..51.049 rows=13578 loops=1) Hash Cond: (outer.cid = inner.id) - Seq Scan on comments (cost=0.00..1178.72 rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1) Filter: (published = 1) - Hash (cost=1391.18..1391.18 rows=3818 width=81) (actual time=14.268..14.268 rows=3818 loops=1) - Seq Scan on news (cost=0.00..1391.18 rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1) The same is being requested from different sessions . So why is it not being cached . Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested. What is the group by clause accomplishing??? The sorting and hash Aggregate is eating up all the time work_mem = 51024 # min 64, size in KB Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache.
Re: [PERFORM] same query in high number of times
Message from mailto:gryz...@gmail.com Grzegorz Jaśkiewicz gryz...@gmail.com at 06-21-2009 09:36:01 PM -- On Sun, Jun 21, 2009 at 9:01 PM, Justin grafjus...@emproshunts.com wrote: work_mem = 51024 # min 64, size in KB Thats allot memory dedicated to work mem if you have 30 connections open this could eat up 1.5gigs pushing the data out of cache. I thought work memory is max memory that can be allocated per connection for sorting, etc. I think it is not allocated when connection is opened, but only on 'if needed' basis.
Re: [PERFORM] same query in high number of times
Message from mailto:peter.alb...@gmail.com Peter Alban peter.alb...@gmail.com at 06-21-2009 10:59:49 PM -- On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf mailto:jus...@emproshunts.com wrote: Peter Alban wrote: duration: 2533.734 ms statement: Limit (cost=4313.54..4313.55 rows=3 width=595) (actual time=288.525..288.528 rows=3 loops=1) Because the query results are not cached only the RAW tables are. The query is rerun every time it is requested. What is the group by clause accomplishing??? The sorting and hash Aggregate is eating up all the time So this should mean that having say a 5 mb table in memory doing such query above takes 2 secs in memory ? Assuming that, we probably have really slow memory :) Besides , the query makes less sense to me , but I dont write the queries (yet) simply looking at the server side . So do you suggest to tune the queries or shall I rather look for other monitoring tools ? cheers, Peter Thats a really tiny table it should be processed in sub milliseconds something else is going on. The actual time in the explain of the query states 288 millisecond not the 2533.734 you state from above. You have not told us the version of PG or the OS its running on. Is there anything else running on the server???
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
I recent just got a new server also from dell 2 weeks ago went with more memory slower CPU, and smaller harddrives have not run pgbench Dell PE 2950 III 2 Quad Core 1.866 Ghz 16 gigs of ram. 8 hard drives 73Gig 10k RPM SAS 2 drives in Mirrored for OS, Binaries, and WAL 6 in a raid 10 Dual Gig Ethernet OS Ubuntu 7.10 --- Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSizeK/sec %CP K/sec %CPK/sec %CP K/sec %CP K/sec%CP /sec %CP PriData 7M 5103090 107488 29 50666 10 38464 65 102931 9268.2 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ PriData,7M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16, +,+++,+,+++,+,+++,+,+++,+,+++,+,+++ the difference in our results are interesting. What are the setting on the RAID card . I have the cache turned on with Read Ahead Message from mailto:[EMAIL PROTECTED] Craig James [EMAIL PROTECTED] at 03-12-2008 09:55:18 PM -- I just received a new server and thought benchmarks would be interesting. I think this looks pretty good, but maybe there are some suggestions about the configuration file. This is a web app, a mix of read/write, where writes tend to be insert into ... (select ...) where the resulting insert is on the order of 100 to 10K rows of two integers. An external process also uses a LOT of CPU power along with each query. Thanks, Craig Configuration: Dell 2950 8 CPU (Intel 2GHz Xeon) 8 GB memory Dell Perc 6i with battery-backed cache RAID 10 of 8x 146GB SAS 10K 2.5 disks Everything (OS, WAL and databases) are on the one RAID array. Diffs from original configuration: max_connections = 1000 shared_buffers = 400MB work_mem = 256MB max_fsm_pages = 100 max_fsm_relations = 5000 wal_buffers = 256kB effective_cache_size = 4GB Bonnie output (slightly reformatted) -- Delete files in random order...done. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block---Rewrite- -Per Chr- --Block----Seeks-- Size K/sec %CP K/sec %CP K/sec %CPK/sec %CP K/sec %CP/sec %CP 16G 64205 99 234252 38 112924 2665275 98 293852 24 940.3 1 --Sequential Create--Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP 16 12203 95 + +++ 19469 94 12297 95 + +++ 15578 82 www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+,+++,19469,94,12297,95,+,+++,15578,82 -- $ pgbench -c 10 -t 1 -v test -U test starting vacuum...end. starting vacuum accounts...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 2786.377933 (including connections establishing) tps = 2787.888209 (excluding connections establishing) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
I did not run into one install problem, I read a thread about people having problems but the thread is over a year old now. I used the 7.1 gutsy amd64 server version I then installed gnome desktop because its not installed by default. i'm a windows admin i have to have my gui then installed postgres 8.3 gutsy. it took about 3 hours to get the server setup. Message from mailto:[EMAIL PROTECTED] Will Weaver [EMAIL PROTECTED] at 03-13-2008 08:11:06 AM -- Justin, This may be a bit out of context, but did you run into any troubles getting your Perc6i RAID controller to work under Ubuntu 7.1? I've heard there were issues with that. Thanks, Will On Mar 13, 2008, at 3:11 AM, Justin Graf wrote: I recent just got a new server also from dell 2 weeks ago went with more memory slower CPU, and smaller harddrives have not run pgbench Dell PE 2950 III 2 Quad Core 1.866 Ghz 16 gigs of ram. 8 hard drives 73Gig 10k RPM SAS 2 drives in Mirrored for OS, Binaries, and WAL 6 in a raid 10 Dual Gig Ethernet OS Ubuntu 7.10 --- Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine SizeK/sec %CP K/sec %CPK/sec %CP K/sec %CP K/sec %CP /sec %CP PriData 7M 51030 90 10748829 50666 10 38464 65 102931 9268.2 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ PriData,7M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16, +,+++,+,+++,+,+++,+,+++,+,+++,+,+++ the difference in our results are interesting. What are the setting on the RAID card . I have the cache turned on with Read Ahead Message from mailto:[EMAIL PROTECTED] Craig James [EMAIL PROTECTED] at 03-12-2008 09:55:18 PM -- I just received a new server and thought benchmarks would be interesting. I think this looks pretty good, but maybe there are some suggestions about the configuration file. This is a web app, a mix of read/write, where writes tend to be insert into ... (select ...) where the resulting insert is on the order of 100 to 10K rows of two integers. An external process also uses a LOT of CPU power along with each query. Thanks, Craig Configuration: Dell 2950 8 CPU (Intel 2GHz Xeon) 8 GB memory Dell Perc 6i with battery-backed cache RAID 10 of 8x 146GB SAS 10K 2.5 disks Everything (OS, WAL and databases) are on the one RAID array. Diffs from original configuration: max_connections = 1000 shared_buffers = 400MB work_mem = 256MB max_fsm_pages = 100 max_fsm_relations = 5000 wal_buffers = 256kB effective_cache_size = 4GB Bonnie output (slightly reformatted) -- Delete files in random order...done. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block---Rewrite- -Per Chr- --Block----Seeks-- Size K/sec %CP K/sec %CP K/sec %CPK/sec %CP K/sec %CP/sec %CP 16G 64205 99 234252 38 112924 2665275 98 293852 24 940.3 1 --Sequential Create--Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP 16 12203 95 + +++ 19469 94 12297 95 + +++ 15578 82 http://www.xxx.com www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+,+++,19469,94,12297,95,+,+++,15578,82 -- $ pgbench -c 10 -t 1 -v test -U test starting vacuum...end. starting vacuum accounts...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 2786.377933 (including connections establishing) tps = 2787.888209 (excluding connections establishing) -- Sent via pgsql-performance mailing list (mailto:pgsql-performance@postgresql.org ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance