Re: [PERFORM] Slow query with 3 table joins
This looks like the same optimizer problem that occasionally plagues our customers. Whenever the estimated rows of a join==1, but the actual rows is higher, the optimizer may choose very poor plans. I made some attempts to fix. The very simple fix is to never estimate 1 for a join result. Even using 2 works remarkably well as a defense against this problem. https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a I also made a much more correct but complicated patch to track both uniqueness and selectivity thought the optimizer, but I didn't quite push that over the finish line (I made a mistake in the hash join code, and got distracted by my day job before finishing it). https://github.com/labkey-matthewb/postgres/commits/struct_selectivity The second path is certainly better approach, but needs someone to pick up the mission. Matt On Wed, Apr 26, 2017 at 8:00 AM, Gerardo Herzigwrote: > Some other approaches you could try: > > 1) What about an hashed index? You could make > CREATE INDEX ON FIELD (unit_id, hashtext(field_name)) > > and changing your query accordingly > > "where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') " > > 2) Partitioning (not native yet, but can be simulated through > inheritance), like in > https://www.postgresql.org/docs/current/static/ddl-partitioning.html > This could work well if you have a sort of limited different values in > FIELD.FIELD_NAME > > Gerardo > > - Mensaje original - > > De: "Alessandro Ferrucci" > > Para: pgsql-performance@postgresql.org > > Enviados: Miércoles, 26 de Abril 2017 0:19:37 > > Asunto: Re: [PERFORM] Slow query with 3 table joins > > > > > > > > After about 40 inutes the slow query finally finished and the result > > of the EXPLAIN plan can be found here: > > > > > > https://explain.depesz.com/s/BX22 > > > > > > Thanks, > > Alessandro Ferrucci > > > > > > On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci < > > alessandroferru...@gmail.com > wrote: > > > > > > > > > > Hello - I am migrating a current system to PostgreSQL and I am having > > an issue with a relatively straightforward query being extremely > > slow. > > > > > > The following are the definitions of the tables: > > > > > > CREATE TABLE popt_2017.unit > > ( > > id serial NOT NULL, > > unit_id text, > > batch_id text, > > create_date timestamp without time zone DEFAULT now(), > > update_date timestamp without time zone, > > CONSTRAINT unit_pkey PRIMARY KEY (id) > > ) > > WITH ( > > OIDS=FALSE > > ); > > > > > > CREATE TABLE popt_2017.field > > ( > > id serial NOT NULL, > > unit_id integer, > > subunit_data_id integer, > > field_name character varying(50), > > page_id character varying(20), > > page_type character varying(20), > > batch_id character varying(20), > > file_name character varying(20), > > data_concept integer, > > "GROUP" integer, > > omr_group integer, > > pres integer, > > reg_data text, > > ocr_conf text, > > ocr_dict text, > > ocr_phon text, > > create_date timestamp without time zone DEFAULT now(), > > update_date timestamp without time zone, > > CONSTRAINT field_pkey PRIMARY KEY (id), > > CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id) > > REFERENCES popt_2017.subunit (id) MATCH SIMPLE > > ON UPDATE NO ACTION ON DELETE NO ACTION, > > CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id) > > REFERENCES popt_2017.unit (id) MATCH FULL > > ON UPDATE NO ACTION ON DELETE NO ACTION, > > CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id) > > REFERENCES popt_2017.unit (id) MATCH SIMPLE > > ON UPDATE NO ACTION ON DELETE NO ACTION > > ) > > WITH ( > > OIDS=FALSE > > ); > > > > > > CREATE TABLE popt_2017.answer > > ( > > id serial NOT NULL, > > field_id integer, > > ans_status integer, > > ans text, > > luggage text, > > arec text, > > kfi_partition integer, > > final boolean, > > length integer, > > create_date timestamp without time zone DEFAULT now(), > > update_date timestamp without time zone, > > CONSTRAINT answer_pkey PRIMARY KEY (id), > > CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id) > > REFERENCES popt_2017.field (id) MATCH FULL > > ON UPDATE NO ACTION ON DELETE NO ACTION, > > CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id) > > REFERENCES popt_2017.field (id) MATCH SIMPLE > > ON UPDATE NO ACTION ON DELETE NO ACTION > > ) > > WITH ( > > OIDS=FALSE > > ); > > > > > > Below are the index definitions for those tables: > > > > > > UNIT: > > CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id); > > CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id); > > > > > > FIELD: > > CREATE UNIQUE INDEX field_pkey ON field USING btree (id) > > CREATE INDEX field_unit_id_idx ON field USING btree (unit_id) > > CREATE INDEX field_subunit_id_idx ON field USING btree > > (subunit_data_id) > > CREATE INDEX field_field_name_idx ON field USING btree (field_name) > > > > > > ANSWER: > > CREATE UNIQUE INDEX answer_pkey ON answer
[PERFORM] Performance difference between Slon master and slave
Hello all, I hope someone can help me with this. Postgres 9.4.4 Slon 2.2.4 Linux I am using slony-i to replicate a production database which is in the order of 70GB. I have a reasonably complex select query that runs in 40 seconds on the master but takes in the region of 30-40 minutes on the slave. The postgres configurations are identical and the machines are a similar specifications (12 core hyper threaded HP server and the slave has slightly less RAM: 132GB vs 148GB) The server running the slave database has a higher load than the one running the master though the load average on the slave machine was low (1-2) when running the test and the postgres process on the slave machine runs at 100% of a CPU with very little iowait on the server. Inspecting the execution plan shows that there are some differences, for example, the slave is using a HashAggregate when the master is simply grouping. There also seems to be a difference with the ordering of the sub plans. Armed with this knowledge I have set enable_hashagg to off and run the query again and it now takes 53 seconds on the slave which is a more acceptable difference and the execution plans now look very similar (one difference being that there is another HashAggregate in the master which is now missing on the slave and may account for the 13 seconds). I have isolated a much simpler query which I have detailed below with their execution plans which shows the difference on line 4. I would rather not disable hash aggregation on the slave as this might have other consequences so this raises a number of questions. Firstly Is there anything that I can do to stop this feature? Why is the slave behaving differently to the master? Thanks in advance for any help. Cheers Matthew explain with my_view_booking_pax_breakdown as ( SELECT bev.booking_id, ( SELECT count(*) AS count FROM passenger_version WHERE passenger_version.current_version = 'T'::bpchar AND passenger_version.deleted = 'F'::bpchar AND passenger_version.indicative_pax_type = 'A'::bpchar AND passenger_version.booking_id = bev.booking_id) AS adult_count, ( SELECT count(*) AS count FROM passenger_version WHERE passenger_version.current_version = 'T'::bpchar AND passenger_version.deleted = 'F'::bpchar AND passenger_version.indicative_pax_type = 'C'::bpchar AND passenger_version.booking_id = bev.booking_id) AS child_count, ( SELECT count(*) AS count FROM passenger_version WHERE passenger_version.current_version = 'T'::bpchar AND passenger_version.deleted = 'F'::bpchar AND passenger_version.indicative_pax_type = 'I'::bpchar AND passenger_version.booking_id = bev.booking_id) AS infant_count FROM booking_expanded_version bev GROUP BY bev.booking_id ) select * from "my_view_booking_pax_breakdown" "view_booking_pax_breakdown" INNER JOIN "booking"."booking_expanded_version" "booking_expanded_version" ON "view_booking_pax_breakdown"."booking_id"="booking_expanded_version"."booking_id" Master "Merge Join (cost=5569138.32..6158794.12 rows=2461265 width=1375)" " Merge Cond: (booking_expanded_version.booking_id = view_booking_pax_breakdown.booking_id)" " CTE my_view_booking_pax_breakdown" *"-> Group (cost=0.43..5545692.19 rows=215891 width=4)"* " Group Key: bev.booking_id" " -> Index Only Scan using booking_expanded_version_booking_idx on booking_expanded_version bev (cost=0.43..64607.40 rows=2461265 width=4)" " SubPlan 1" "-> Aggregate (cost=8.57..8.58 rows=1 width=0)" " -> Index Scan using passenger_version_idx_4 on passenger_version (cost=0.43..8.55 rows=5 width=0)" "Index Cond: (booking_id = bev.booking_id)" " SubPlan 2" "-> Aggregate (cost=8.45..8.46 rows=1 width=0)" " -> Index Scan using passenger_version_idx_3 on passenger_version passenger_version_1 (cost=0.42..8.45 rows=1 width=0)" "Index Cond: (booking_id = bev.booking_id)" " SubPlan 3" "-> Aggregate (cost=8.31..8.32 rows=1 width=0)" " -> Index Scan using passenger_version_idx_2 on passenger_version passenger_version_2 (cost=0.29..8.31 rows=1 width=0)" "Index Cond: (booking_id = bev.booking_id)" " -> Index Scan using booking_expanded_version_booking_idx on booking_expanded_version (cost=0.43..546584.09 rows=2461265 width=1347)" " -> Sort (cost=23445.70..23985.43 rows=215891 width=28)" "Sort Key: view_booking_pax_breakdown.booking_id" "
[PERFORM] Query optimizer plans with very small selectivity estimates
This related to a post in the general bugs forum, but I found this forum, and this seems more appropriate. This is my second attempt to post, I believe the first attempt last week did not work, apologies if I'm duplicating. http://comments.gmane.org/gmane.comp.db.postgresql.bugs/39011 I made have several users encounter performance problems, which all seem to come down to this problem: multiplying selectivity estimates can cause tuple estimates to grow very small very quickly, once the estimator gets to 1 row, the planner may choose plans that are very good ONLY WHEN there is exactly 1 row (maybe even O(N^large)). Unfortunately, these may be the worst plans if the estimate is even slightly off (even just returning 2 or 3 rows versus 1). Using the patch below, I discovered that clamping relation tuple estimates to a number as small as 2 seemed to avoid all the catastrophic query plans. In the scenarios I'm seeing, I have several examples of queries that take >1m to run that should run in <1s. The estimate of 1 row (versus thousands actual) leads the planner to tee up several nest loop joins which causes thousands of table scans. I have been working on a more complete which tracks uniqueness along with selectivity so that optimizer can benefit from knowing when a relation must have 1 (or fewer) tuples, while clamping all other relations to 2 rather than 1. typedef struct { double selectivity; boolean unique; } Selectivity; I am interested in hearing discussion about this problem, and if the community is open to a patch if I continue pursuing the development. Matt FIRST ARTIFACT plan with expensive (80s join) and join estimate of 1 note the first Nested Loop join and 81s join (I gave up trying to post the full explain, because of the 80 char limit) "Sort (cost=7000.04..7000.04 rows=1 width=49) (actual time=81739.426..81740.023 rows=5091 loops=1)" " Sort Key: c.ten DESC" " Sort Method: quicksort Memory: 948kB" " CTE cte" "-> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=4) (actual time=0.001..0.001 rows=5 loops=1)" " -> Nested Loop (cost=1.36..6999.97 rows=1 width=49) (actual time=0.059..81725.475 rows=5091 loops=1)" "Planning time: 1.912 ms" "Execution time: 81740.328 ms" SECOND ARTIFACT force join row estimate to be minimun of 2 query completes very quickly "Sort (cost=7000.06..7000.06 rows=2 width=49) (actual time=84.610..85.192 rows=5142 loops=1)" " Sort Key: c.ten DESC" " Sort Method: quicksort Memory: 956kB" " CTE cte" "-> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=4) (actual time=0.002..0.003 rows=5 loops=1)" " -> Hash Join (cost=2518.99..6999.98 rows=2 width=49) (actual time=17.629..82.886 rows=5142 loops=1)" "Planning time: 2.982 ms" "Execution time: 85.514 ms" THIRD ARTIFACT patch I used to make experimenting easier w/o recompiling index 1b61fd9..444703c 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -68,6 +68,12 @@ *- */ + + +/* These parameters are set by GUC */ +int join_row_estimate_clamp=1; + + #include "postgres.h" #ifdef _MSC_VER @@ -175,6 +181,17 @@ clamp_row_est(double nrows) } +double +clamp_join_row_est(double nrows) +{ + nrows = clamp_row_est(nrows); + if (nrows >= (double)join_row_estimate_clamp) + return nrows; +return (double)join_row_estimate_clamp; +} + + + /* * cost_seqscan * Determines and returns the cost of scanning a relation sequentially. @@ -3886,7 +3903,7 @@ calc_joinrel_size_estimate(PlannerInfo *root, break; } - return clamp_row_est(nrows); + return clamp_join_row_est(nrows); } /* diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 71090f2..fabb8ac 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2664,6 +2664,16 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"join_row_estimate_clamp", PGC_USERSET, QUERY_TUNING_OTHER, + gettext_noop("Set the minimum estimated size of a join result."), +NULL + }, + _row_estimate_clamp, + 1, 1, 1, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h index 25a7303..0161c4b 100644 --- a/src/include/optimizer/cost.h +++ b/src/include/optimizer/cost.h @@ -67,8 +67,10 @@ extern bool enable_material; extern bool enable_mergejoin; extern bool enable_hashjoin; extern int constraint_exclusion; +extern int join_row_estimate_clamp; extern double clamp_row_est(double nrows); +extern double clamp_join_row_est(double nrows); extern double index_pages_fetched(double tuples_fetched, BlockNumber pages, double index_pages, PlannerInfo *root); extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
Re: [PERFORM] Stalls on PGSemaphoreLock
Hi all - I am a little delayed in reporting back on this issue, but it was indeed the hugepage defrag setting that was the cause of my issue. One item that we noticed as we were testing this issue that I wanted to report back to the forum is that these settings cat /sys/kernel/mm/transparent_hugepage/defrag always [never] cat /sys/kernel/mm/transparent_hugepage/enabled always [never] Were not sicky on reboot for my version of CentOS, which probably explains why I thought this was disabled already only to have it crop back up. Anyway, I wanted to report back these findings to close the loop on this and to thank the community again for their support. Best, Matt From: Pavy Philippe [philippe.p...@worldline.com] Sent: Tuesday, March 25, 2014 4:10 PM To: Matthew Spilich; pgsql-performance@postgresql.org Subject: RE: [PERFORM] Stalls on PGSemaphoreLock Here, we were the transparent hugepage always actif: cat /sys/kernel/mm/redhat_transparent_hugepage/enabled [always] never We changed to: cat /sys/kernel/mm/redhat_transparent_hugepage/enabled always [never] For the semaphore, our initial configuration was: cat /proc/sys/kernel/sem 250 32000 32 128 And we changed to: cat /proc/sys/kernel/sem 5010641280 5010128 -Message d'origine- De : pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] De la part de Matthew Spilich Envoyé : mardi 25 mars 2014 19:38 À : pgsql-performance@postgresql.org Objet : Re: [PERFORM] Stalls on PGSemaphoreLock Thanks all: Ray: Thanks, we started to look at the hardware/firmware, but didn't get to the the level of detail or running sar. I will probably collect more detail in this area if I continue to see issues. Pavy - I hope that you are right that the hugepage setting is the issue. I was under the impression that I had it disabled already because this has been an known issue for us in the past, but it turns out this was not the case for this server in question. I have disabled it at this time, but it will take a few days of running without issue before I am comfortable declaring that this is the solution. Can you elaborate on the change you mention to upgrade the semaphore configuration? I think this is not something I have looked at before. Ashutosh - Thanks for the reply, I started to do that at first. I turned on log_statement=all for a few hours and I generated a few GB of log file, and I didn't want to leave it running in that state for too long because the issue happens every few days, and not on any regular schedule, so I reverted that after collecting a few GB of detail in the pg log. What I'm doing now to sample every few seconds is I think giving me a decent picture of what is going on with the incident occurs and is a level of data collection that I am more comfortable will not impact operations. I am also logging at the level of 'mod' and all duration 500ms. I don't see that large write operations are a contributing factor leading up to these incidents. I'm hoping that disabling the hugepage setting will be the solution to this. I'll check back in a day or two with feedback. Thanks, Matt From: Pavy Philippe [philippe.p...@worldline.com] Sent: Tuesday, March 25, 2014 1:45 PM To: Ray Stell; Matthew Spilich Cc: pgsql-performance@postgresql.org Subject: RE : [PERFORM] Stalls on PGSemaphoreLock Hello Recently I have a similar problem. The first symptom was a freeze of the connection and 100% of CPU SYS during 2 et 10 minutes, 1 or 2 times per day. Connection impossible, slow query. The strace on one backend show a very long system call on semop(). We have a node with 48 cores dans 128 Go of memory. We have disable the hugepage and upgrade the semaphore configuration, and since that time, we no longer have any problem of freeze on our instance. Can you check the hugepage and semaphore configuration on our node ? I am interested in this case, so do not hesitate to let me make a comeback. Thanks. excuse me for my bad english !!! De : pgsql-performance-ow...@postgresql.org [pgsql-performance-ow...@postgresql.org] de la part de Ray Stell [ste...@vt.edu] Date d'envoi : mardi 25 mars 2014 18:17 À : Matthew Spilich Cc : pgsql-performance@postgresql.org Objet : Re: [PERFORM] Stalls on PGSemaphoreLock On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote: The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able
[PERFORM] semaphore waits and performance stall
Hi everyone! I've been working on a puzzling issue for a few days am am hoping that someone has seen something similar or can help. There have been some odd behaviors on one of my production facing postgres servers. version info from postgres: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior. The event will occur just long enough for monitoring to alarm. The team will respond to alerts to take a look, but within a minute or three at most, load returns back to normal levels and all running queries complete in expected times. At the time of the event, we see a spike in system CPU and load average, but we do not see a corresponding spike in disk reads or writes which would indicate IO load. Initial troubleshooting to monitor active processes led us to see a flurry of activity in ps waiting on semtimedop. Our efforts internally to diagnose this problem are to sample pg_locks and pg_stat_activity every 5s plus running a script to look for at least one postgres process waiting on a semaphore, and if it finds one, it gets a stack trace of every running postgres processes with GDB. It also uses strace on 5 processes to find out which semaphore they're waiting on. What we were catching in the following stack trace seems to be representative of where things are waiting when we see an event - here are two examples that are representative: - 47245 - 0x0037392eb197 in semop () from /lib64/libc.so.6 #0 0x0037392eb197 in semop () from /lib64/libc.so.6 #1 0x005e0c87 in PGSemaphoreLock () #2 0x0061e3af in LWLockAcquire () #3 0x0060aa0f in ReadBuffer_common () #4 0x0060b2e4 in ReadBufferExtended () #5 0x0047708d in _bt_relandgetbuf () #6 0x0047aac4 in _bt_search () #7 0x0047af8d in _bt_first () #8 0x00479704 in btgetbitmap () #9 0x006e7e00 in FunctionCall2Coll () #10 0x00473120 in index_getbitmap () #11 0x005726b8 in MultiExecBitmapIndexScan () #12 0x0057214d in BitmapHeapNext () #13 0x0056b18e in ExecScan () #14 0x00563ed8 in ExecProcNode () #15 0x00562d72 in standard_ExecutorRun () #16 0x0062ce67 in PortalRunSelect () #17 0x0062e128 in PortalRun () #18 0x0062bb66 in PostgresMain () #19 0x005ecd01 in ServerLoop () #20 0x005ef401 in PostmasterMain () #21 0x00590ff8 in main () - 47257 - 0x0037392eb197 in semop () from /lib64/libc.so.6 #0 0x0037392eb197 in semop () from /lib64/libc.so.6 #1 0x005e0c87 in PGSemaphoreLock () #2 0x0061e3af in LWLockAcquire () #3 0x0060aa0f in ReadBuffer_common () #4 0x0060b2e4 in ReadBufferExtended () #5 0x0047708d in _bt_relandgetbuf () #6 0x0047aac4 in _bt_search () #7 0x0047af8d in _bt_first () #8 0x004797d1 in btgettuple () #9 0x006e7e00 in FunctionCall2Coll () #10 0x0047339d in index_getnext () #11 0x00575ed6 in IndexNext () #12 0x0056b18e in ExecScan () #13 0x00563ee8 in ExecProcNode () #14 0x00562d72 in standard_ExecutorRun () #15 0x0062ce67 in PortalRunSelect () #16 0x0062e128 in PortalRun () #17 0x0062bb66 in PostgresMain () #18 0x005ecd01 in ServerLoop () #19 0x005ef401 in PostmasterMain () #20 0x00590ff8 in main () Has any on the forum seen something similar? Any suggestions on what to look at next?If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives). The workload is predominantly read and the queries are mostly fairly simple selects from a single large table generally specifying the primary key as part of the where clause along with a few other filters. Thanks, Matt
[PERFORM] Stalls on PGSemaphoreLock
Hi everyone! I've been working on a puzzling issue for a few days am am hoping that someone has seen something similar or can help. There have been some odd behaviors on one of my production facing postgres servers. version info from postgres: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior. The event will occur just long enough for monitoring to alarm. We will respond to alerts to take a look, but within a minute or three at most, load returns back to normal levels and all running queries complete in expected times. At the time of the event, we see a spike in system CPU and load average, but we do not see a corresponding spike in disk reads or writes which would indicate IO load. Initial troubleshooting to monitor active processes led us to see a flurry of activity in ps waiting on semtimedop. Our efforts internally to diagnose this problem are to sample pg_locks and pg_stat_activity every 5s plus running a script to look for at least one postgres process waiting on a semaphore, and if it finds one, it gets a stack trace of every running postgres processes with GDB. It also uses strace on 5 processes to find out which semaphore they're waiting on. What we were catching in the following stack trace seems to be representative of where things are waiting when we see an event - here are two examples that are representative - lots of threads will appear to be in this state: - 47245 - 0x0037392eb197 in semop () from /lib64/libc.so.6 #0 0x0037392eb197 in semop () from /lib64/libc.so.6 #1 0x005e0c87 in PGSemaphoreLock () #2 0x0061e3af in LWLockAcquire () #3 0x0060aa0f in ReadBuffer_common () #4 0x0060b2e4 in ReadBufferExtended () #5 0x0047708d in _bt_relandgetbuf () #6 0x0047aac4 in _bt_search () #7 0x0047af8d in _bt_first () #8 0x00479704 in btgetbitmap () #9 0x006e7e00 in FunctionCall2Coll () #10 0x00473120 in index_getbitmap () #11 0x005726b8 in MultiExecBitmapIndexScan () #12 0x0057214d in BitmapHeapNext () #13 0x0056b18e in ExecScan () #14 0x00563ed8 in ExecProcNode () #15 0x00562d72 in standard_ExecutorRun () #16 0x0062ce67 in PortalRunSelect () #17 0x0062e128 in PortalRun () #18 0x0062bb66 in PostgresMain () #19 0x005ecd01 in ServerLoop () #20 0x005ef401 in PostmasterMain () #21 0x00590ff8 in main () - 47257 - 0x0037392eb197 in semop () from /lib64/libc.so.6 #0 0x0037392eb197 in semop () from /lib64/libc.so.6 #1 0x005e0c87 in PGSemaphoreLock () #2 0x0061e3af in LWLockAcquire () #3 0x0060aa0f in ReadBuffer_common () #4 0x0060b2e4 in ReadBufferExtended () #5 0x0047708d in _bt_relandgetbuf () #6 0x0047aac4 in _bt_search () #7 0x0047af8d in _bt_first () #8 0x004797d1 in btgettuple () #9 0x006e7e00 in FunctionCall2Coll () #10 0x0047339d in index_getnext () #11 0x00575ed6 in IndexNext () #12 0x0056b18e in ExecScan () #13 0x00563ee8 in ExecProcNode () #14 0x00562d72 in standard_ExecutorRun () #15 0x0062ce67 in PortalRunSelect () #16 0x0062e128 in PortalRun () #17 0x0062bb66 in PostgresMain () #18 0x005ecd01 in ServerLoop () #19 0x005ef401 in PostmasterMain () #20 0x00590ff8 in main () Has any on the forum seen something similar? Any suggestions on what to look at next?If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives). The workload is predominantly read and the queries are mostly fairly simple selects from a single large table generally specifying the primary key as part of the where clause along with a few other filters. Thanks, Matt
Re: [PERFORM] Stalls on PGSemaphoreLock
Thanks all: Ray: Thanks, we started to look at the hardware/firmware, but didn't get to the the level of detail or running sar. I will probably collect more detail in this area if I continue to see issues. Pavy - I hope that you are right that the hugepage setting is the issue. I was under the impression that I had it disabled already because this has been an known issue for us in the past, but it turns out this was not the case for this server in question. I have disabled it at this time, but it will take a few days of running without issue before I am comfortable declaring that this is the solution. Can you elaborate on the change you mention to upgrade the semaphore configuration? I think this is not something I have looked at before. Ashutosh - Thanks for the reply, I started to do that at first. I turned on log_statement=all for a few hours and I generated a few GB of log file, and I didn't want to leave it running in that state for too long because the issue happens every few days, and not on any regular schedule, so I reverted that after collecting a few GB of detail in the pg log. What I'm doing now to sample every few seconds is I think giving me a decent picture of what is going on with the incident occurs and is a level of data collection that I am more comfortable will not impact operations. I am also logging at the level of 'mod' and all duration 500ms. I don't see that large write operations are a contributing factor leading up to these incidents. I'm hoping that disabling the hugepage setting will be the solution to this. I'll check back in a day or two with feedback. Thanks, Matt From: Pavy Philippe [philippe.p...@worldline.com] Sent: Tuesday, March 25, 2014 1:45 PM To: Ray Stell; Matthew Spilich Cc: pgsql-performance@postgresql.org Subject: RE : [PERFORM] Stalls on PGSemaphoreLock Hello Recently I have a similar problem. The first symptom was a freeze of the connection and 100% of CPU SYS during 2 et 10 minutes, 1 or 2 times per day. Connection impossible, slow query. The strace on one backend show a very long system call on semop(). We have a node with 48 cores dans 128 Go of memory. We have disable the hugepage and upgrade the semaphore configuration, and since that time, we no longer have any problem of freeze on our instance. Can you check the hugepage and semaphore configuration on our node ? I am interested in this case, so do not hesitate to let me make a comeback. Thanks. excuse me for my bad english !!! De : pgsql-performance-ow...@postgresql.org [pgsql-performance-ow...@postgresql.org] de la part de Ray Stell [ste...@vt.edu] Date d'envoi : mardi 25 mars 2014 18:17 À : Matthew Spilich Cc : pgsql-performance@postgresql.org Objet : Re: [PERFORM] Stalls on PGSemaphoreLock On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote: The symptom: The database machine (running postgres 9.1.9 on CentOS 6.4) is running a low utilization most of the time, but once every day or two, it will appear to slow down to the point where queries back up and clients are unable to connect. Once this event occurs, there are lots of concurrent queries, I see slow queries appear in the logs, but there doesn't appear to be anything abnormal that I have been able to see that causes this behavior. ... Has any on the forum seen something similar? Any suggestions on what to look at next?If it is helpful to describe the server hardware, it's got 2 E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 local storage (15K 300 GB drives). I could be way off here, but years ago I experienced something like this (in oracle land) and after some stressful chasing, the marginal failure of the raid controller revealed itself. Same kind of event, steady traffic and then some i/o would not complete and normal ops would stack up. Anyway, what you report reminded me of that event. The E5 is a few years old, I wonder if the raid controller firmware needs a patch? I suppose a marginal power supply might cause a similar hang. Anyway, marginal failures are very painful. Have you checked sar or OS logging at event time? Ce message et les pièces jointes sont confidentiels et réservés à l'usage exclusif de ses destinataires. Il peut également être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra être recherchée quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne saurait être recherchée pour tout dommage résultant d'un virus transmis. This e-mail and the documents attached are confidential
Re: [PERFORM] DELETE vs TRUNCATE explanation
Tom Lane wrote: (3) The performance of the truncation itself should not be viewed in isolation; subsequent behavior also needs to be considered. An example of possible degradation is that index bloat would no longer be guaranteed to be cleaned up over a series of repeated truncations. (You might argue that if the table is small then the indexes couldn't be very bloated, but I don't think that holds up over a long series.) IOW, I think it's fine as-is. I'd certainly wish to see many more than one complainant before we expend effort in this area. I think a documentation change would be worthwhile. At the moment the TRUNCATE page says, with no caveats, that it is faster than unqualified DELETE. It surprised me to find that this wasn't true (with 7.2, again with small tables in a testsuite), and evidently it's still surprising people today. -M- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
tracker_objectid on tracker b (cost=0.00..1096747.23 rows=3616658 width=17) (actual time=0.084..5781.844 rows=3790872 loops=1) Index Cond: (objectid 120) Total runtime: 55756.482 ms (14 rows) On Tue, 10 Aug 2010, Jonathan Blitz wrote: Select groupfield,value From tbl x1 Where number = (select max(number) from tbl x2 where x2.groupfield= x1.groupfield) This one effectively forces a nested loop join: explain analyse select objectid, fieldname, sourcename from tracker as a where version = (select max(version) from tracker as b where a.objectid = b.objectid and a.fieldname = b.fieldname) and a.objectid 120; QUERY PLAN - Index Scan using tracker_objectid on tracker a (cost=0.00..58443381.75 rows=18083 width=34) (actual time=6.482..59803.225 rows=1802376 loops=1) Index Cond: (objectid 120) Filter: (version = (SubPlan 2)) SubPlan 2 - Result (cost=15.89..15.90 rows=1 width=0) (actual time=0.011..0.012 rows=1 loops=3790872) InitPlan 1 (returns $2) - Limit (cost=0.00..15.89 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=3790872) - Index Scan Backward using tracker_all on tracker b (cost=0.00..31.78 rows=2 width=4) (actual time=0.005..0.005 rows=1 loops=3790872) Index Cond: (($0 = objectid) AND ($1 = fieldname)) Filter: (version IS NOT NULL) Total runtime: 61649.116 ms (11 rows) On Tue, 10 Aug 2010, Jonathan Blitz wrote: Select groupfield,value From tbl x1 Where (groupfield,number) in (select groupfield,max(number) from tbl group by groupfield) This is another join. explain analyse select objectid, fieldname, sourcename from tracker where (objectid, fieldname, version) in (select objectid, fieldname, max(version) from tracker group by objectid, fieldname); I terminated this query after about an hour. Here is the EXPLAIN: QUERY PLAN Hash Join (cost=55310973.80..72060974.32 rows=55323 width=30) Hash Cond: ((public.tracker.objectid = public.tracker.objectid) AND (public.tracker.fieldname = public.tracker.fieldname) AND (public.tracker.version = (max(public.tracker.version - Seq Scan on tracker (cost=0.00..5310600.80 rows=293972480 width=34) - Hash (cost=54566855.96..54566855.96 rows=29397248 width=40) - GroupAggregate (cost=50965693.08..54272883.48 rows=29397248 width=17) - Sort (cost=50965693.08..51700624.28 rows=293972480 width=17) Sort Key: public.tracker.objectid, public.tracker.fieldname - Seq Scan on tracker (cost=0.00..5310600.80 rows=293972480 width=17) (8 rows) Matthew -- I quite understand I'm doing algebra on the blackboard and the usual response is to throw objects... If you're going to freak out... wait until party time and invite me along -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sorted group by
I'm trying to eke a little bit more performance out of an application, and I was wondering if there was a better way to do the following: I am trying to retrieve, for many sets of rows grouped on a couple of fields, the value of an ungrouped field where the row has the highest value in another ungrouped field. For instance, I have the following table setup: group | whatever type value | whatever type number | int Index: group I then have rows like this: group | value | number - Foo | foo | 1 Foo | turnips | 2 Bar | albatross | 3 Bar | monkey| 4 I want to receive results like this: group | value --- Foo | turnips Bar | monkey Currently, I do this in my application by ordering by the number and only using the last value. I imagine that this is something that can be done in the new Postgres 9, with a sorted group by - something like this: SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group Is this something that is already built in, or would I have to write my own LAST aggregate function? Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like ha-ha-ha, but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sorted group by
On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table I may be mistaken, but as I understand it, a windowing function doesn't reduce the number of rows in the results? Matthew -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice configuring ServeRAID 8k for performance
On Thu, 5 Aug 2010, Scott Marlowe wrote: RAID6 is basically RAID5 with a hot spare already built into the array. On Fri, 6 Aug 2010, Pierre C wrote: As others said, RAID6 is RAID5 + a hot spare. No. RAID6 is NOT RAID5 plus a hot spare. RAID5 uses a single parity datum (XOR) to ensure protection against data loss if one drive fails. RAID6 uses two different sets of parity (Reed-Solomon) to ensure protection against data loss if two drives fail simultaneously. If you have a RAID5 set with a hot spare, and you lose two drives, then you have data loss. If the same happens to a RAID6 set, then there is no data loss. Matthew -- And the lexer will say Oh look, there's a null string. Oooh, there's another. And another., and will fall over spectacularly when it realises there are actually rather a lot. - Computer Science Lecturer (edited) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Sun, 25 Jul 2010, Yeb Havinga wrote: Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at http://tinypic.com/r/x5e846/3 Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Matthew -- The early bird gets the worm, but the second mouse gets the cheese. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, 26 Jul 2010, Greg Smith wrote: Matthew Wakeling wrote: Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Have you tried that yourself? If you generate one of those with standard hard drives and a BBWC under Linux, I expect you'll discover those latencies to be 5 seconds long. I recently saw 100 *seconds* running a large pgbench test due to latency flushing things to disk, on a system with 72GB of RAM. Takes a long time to flush 3GB of random I/O out to disk when the kernel will happily cache that many writes until checkpoint time. Apologies, I was interpreting the graph as the latency of the device, not all the layers in-between as well. There isn't any indication in the email with the graph as to what the test conditions or software are. Obviously if you factor in checkpoints and the OS writing out everything, then you would have to expect some large latency operations. However, if the device itself behaved as in the graph, I would be most unhappy and send it back. Yeb also made the point - there are far too many points on that graph to really tell what the average latency is. It'd be instructive to have a few figures, like only x% of requests took longer than y. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Sat, 10 Jul 2010, Tom Lane wrote: Doesn't pgpool do this? No, and in fact that's exactly why the proposed implementation isn't ever going to be in core: it's not possible to do it portably. I'm surprised. Doesn't apache httpd do this? Does it have to do a whole load of non-portable stuff? It seems to work on a whole load of platforms. Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon.-- Tim Mullen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 9 Jul 2010, Kevin Grittner wrote: Any thoughts on the minimalist solution I suggested a couple weeks ago?: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php So far, there has been no comment by anyone Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? A proper connection pool provides the following advantages over this: 1. Pool can be on a separate machine or machines, spreading load. 2. Pool has a lightweight footprint per connection, whereas Postgres doesn't. 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. 5. A fair amount of the overhead is caused by context-switching between backends. The more backends, the less useful any CPU caches. 6. There are some internal workings of Postgres that involve keeping all the backends informed about something going on. The more backends, the greater this overhead is. (This was pretty bad with the sinval queue overflowing a while back, but a bit better now. It still causes some overhead). 7. That lock would have a metric *($!-load of contention. Matthew -- Unfortunately, university regulations probably prohibit me from eating small children in front of the lecture class. -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 9 Jul 2010, Kevin Grittner wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. I think in some situations, this arrangement would be an advantage. However, I do not think it will suit the majority of situations, and could reduce the performance when the user doesn't need the functionality, either because they have a pool already, or they don't have many connections. No, I don't have any numbers. 1. Pool can be on a separate machine or machines, spreading load. Sure, but how would you do that with a built-in implementation? That's my point exactly. If you have an external pool solution, you can put it somewhere else - maybe on multiple somewhere elses. 3. A large amount of the overhead is sometimes connection setup, which this would not solve. A pool has cheap setup. This would probably be most useful where the client held a connection for a long time, not for the login for each database transaction approach. I'm curious how often you think application software uses that approach. What you say is true. I don't know how often that is, but it seems to be those times that people come crying to the mailing list. 4. This could cause Postgres backends to be holding onto large amounts of memory while being prevented from doing anything, which is a bad use of resources. Isn't this point 2 again? Kind of. Yes. Point 2 was simple overhead. This point was that the backend may have done a load of query-related allocation, and then been stopped. 7. That lock would have a metric *($!-load of contention. Here I doubt you. It would be held for such short periods that I suspect that collisions would be relatively infrequent compared to some of the other locks we use. As noted in the email, it may actually normally be an increment and test within an existing locked block. Fair enough. It may be much less of a problem than I had previously thought. Matthew -- Change is inevitable, except from vending machines. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Two different execution plan for the same request
On Wed, 7 Jul 2010, JOUANIN Nicolas (44) wrote: It seems to work fine (same execution plan and less duration) after : - setting default_statistics_target to 100 - full vacuum with analyze Don't do VACUUM FULL. Matthew -- I suppose some of you have done a Continuous Maths course. Yes? Continuous Maths? menacing stares from audience Whoah, it was like that, was it! -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer
On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer cr...@postnewspapers.com.auwrote: Yeah, if you're in a weird virtualized environment like that you're likely to have problems... On Sat, 3 Jul 2010, Rajesh Kumar Mallah wrote: Thanks for thinking about it.I do not understand why u feel OpenVz is weird. at the most its not very popular. It's not OpenVz that is wierd, but virtualisation in general. If you are running in a virtual machine, then all sorts of things will not run as well as expected. Matthew -- Contrary to popular belief, Unix is user friendly. It just happens to be very selective about who its friends are. -- Kyle Hearn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Highly Efficient Custom Sorting
On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm guessing from tea leaves, but the impression I got from Eliot's description is that he's using plpgsql functions as sort comparators. It's not surprising that that sucks performance-wise compared to having the equivalent logic in C/C++ functions used as comparators on the client side. plpgsql is no speed demon. Best fix might be to code the comparators as C functions on the server side. On Fri, 2 Jul 2010, Eliot Gable wrote: I guess the real question is, is a generic C sorting function my only real alternative? Sounds to me like you are not really listening. You don't need to code an entire sorting algorithm in C, as Postgres already has a pretty good one of those. All you need to do is implement a comparator of some kind. Inserting C functions into Postgres is pretty easy, especially on the level of comparators. Matthew -- For those of you who are into writing programs that are as obscure and complicated as possible, there are opportunities for... real fun here -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ideal storage configuration
On Tue, 29 Jun 2010, Samuel Gendler wrote: The copy statements execute in a small fraction of the minute in which they occur. I'm going to ask a silly question here. If the system is already coping quite well with the load, then why are you changing it? All old data gets removed by dropping older partitions. There are no updates at all. That's good. 6 internal drives on battery backed raid (I don't know what RAID level - is there a way to discover this?), all in a single filesystem, so WAL and data are on the same filesystem. I don't believe that we are taking advantage of the battery backed controller, since I only see this in /etc/fstab: UUID=12dcd71d-8aec-4253-815c-b4883195eeb8 / ext3 defaults1 1 That doesn't have anything to do with whether or not the controller has a BBU cache. If the controller does have a BBU cache, then your writes will return quicker - and nothing else. But inserts are happening so rapidly that I don't imagine that getting rid of fsync is going to change performance of the reporting queries too dramatically. Don't get rid of fsync, unless you want to lose your data. Especially with your workload of large transactions, you do not need the benefit of reducing the transaction latency, and even that benefit is not present if you have a BBU cache. It seems like your current disc array is coping quite well with the write traffic. If you really want to boost your read speeds for your reporting queries, then increase the amount of RAM, as Kevin said, and see if you can fit the active portion of the database into RAM. Matthew -- Nog: Look! They've made me into an ensign! O'Brien: I didn't know things were going so badly. Nog: Frightening, isn't it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WAL+Os on a single disk
On Wed, 23 Jun 2010, Scott Marlowe wrote: We have a 12 x 600G hot swappable disk system (raid 10) and 2 internal disk ( 2x 146G) Does it make sense to put the WAL and OS on the internal disks So for us, the WAL and OS and logging on the same data set works well. Generally, it is recommended that you put the WAL onto a separate disc to the data. However, in this case, I would be careful. It may be that the 12 disc array is more capable. Specifically, it is likely that the 12-disc array has a battery backed cache, but the two internal drives (RAID 1 presumably) do not. If this is the case, then putting the WAL on the internal drives will reduce performance, as you will only be able to commit a transaction once per revolution of the internal discs. In contrast, if the WAL is on a battery backed cache array, then you can commit much more frequently. Test it and see. Matthew -- I don't want the truth. I want something I can tell parliament! -- Rt. Hon. Jim Hacker MP -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Write performance
On Thu, 24 Jun 2010, Janning wrote: We have a 12 GB RAM machine with intel i7-975 and using 3 disks Seagate Barracuda 7200.11, ST31500341AS (1.5 GB) Those discs are 1.5TB, not 1.5GB. One disk for the system and WAL etc. and one SW RAID-0 with two disks for postgresql data. Our database is about 24GB. Beware of RAID-0 - make sure you can recover the data when (not if) a disc fails. Our munin graph reports at 9:00 a clock writes of 3000 blocks per second and reads of about 1000 blocks per second on our disk which holds the data directories of postgresql (WAL are on a different disk) 3000 blocks ~ about 3 MB/s write 1000 blocks ~ about 1 MB/s read At the same time we have nearly 50% CPU I/O wait and only 12% user CPU load (so 4 of 8 cpu cores are in use for io wait) Not quite sure what situation you are measuring these figures under. However, as a typical figure, let's say you are doing random access with 8kB blocks (as in Postgres), and the access time on your drive is 8.5ms (as with these drives). For each drive, you will be able to read/write approximately 8kB / 0.0085s, giving 941kB per second. If you have multiple processes all doing random access, then you may be able to utilise both discs and get double that. Of course, writing large chunks is quite a different usage pattern. But I am wondering that writing 3MB/s and reading 1 MB/s seams to be a limit if i can run a test with 89 MB/s writing and 110MB/s reading. That's quite right, and typical performance figures for a drive like that. Matthew -- Don't criticise a man until you have walked a mile in his shoes; and if you do at least he will be a mile behind you and bare footed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] raid10 write performance
On Wed, 23 Jun 2010, Ivan Voras wrote: On 06/23/10 14:00, Florian Weimer wrote: Barrier support on RAID10 seems to require some smallish amount of non-volatile storage which supports a high number of write operations per second, so a software-only solution might not be available. If I understand you correctly, this can be said in general for all spinning-disk usage and is not specific to RAID10. (And in the case of high, constant TPS, no amount of NVRAM will help you). No. Write barriers work fine with a single disc, assuming it is set up correctly. The barrier is a command telling the disc to make sure that one piece of data is safe before starting to write another piece of data. However, as soon as you have multiple discs, the individual discs do not have a way of communicating with each other to make sure that the first piece of data is written before the other. That's why you need a little bit of non-volatile storage to mediate that to properly support barriers. Of course, from a performance point of view, yes, you need some NVRAM on any kind of spinning storage to maintain high commit rates. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
Dimitri Fontaine wrote: Well I guess I'd prefer a per-transaction setting Not possible, as many others have said. As soon as you make an unsafe transaction, all the other transactions have nothing to rely on. On Thu, 17 Jun 2010, Pierre C wrote: A per-table (or per-index) setting makes more sense IMHO. For instance on recovery, truncate this table (this was mentioned before). That would be much more valuable. I'd like to point out the costs involved in having a whole separate version of Postgres that has all this safety switched off. Package managers will not thank anyone for having to distribute another version of the system, and woe betide the user who installs the wrong version because it runs faster. No, this is much better as a configurable option. Going back to the on recovery, truncate this table. We already have a mechanism for skipping the WAL writes on an entire table - we do that for tables that have been created in the current transaction. It would surely be a small step to allow this to be configurably permanent on a particular table. Moreover, we already have a mechanism for taking a table that has had non-logged changes, and turning it into a fully logged table - we do that to the above mentioned tables when the transaction commits. I would strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may involve some more acrobatics if the table is currently in use by multiple transactions, but would be valuable. This would allow users to create temporary tables that can be shared by several connections. It would also allow bulk loading in parallel of a single large table. With these suggestions, we would still need to WAL-log all the metadata changes, but I think in most circumstances that is not going to be a large burden on performance. Matthew -- Picard: I was just paid a visit from Q. Riker: Q! Any idea what he's up to? Picard: No. He said he wanted to be nice to me. Riker: I'll alert the crew. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
On Fri, 18 Jun 2010, Robert Haas wrote: On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote: Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. This is an interesting idea. I would guess that you could simulate this to some degree by compiling PG with a larger block size. Have you tried this to see whether/how much/for what kind of workloads it helps? To a degree, that is the case. However, if you follow the thread a bit further back, you will find evidence that when the index is in memory, increasing the page size actually decreases the performance, because it uses more CPU. To make it clear - 8kB is not an optimal page size for either fully cached data or sparsely cached data. For disc access, large pages are appropriate, on the order of 256kB. If the page size is much lower than that, then the time taken to fetch it doesn't actually decrease much, and we are trying to get the maximum amount of work done per fetch without slowing fetches down significantly. Given such a large page size, it would then be appropriate to have a better data structure inside the page. Currently, our indexes (at least the GiST ones - I haven't looked at the Btree ones) use a simple linear array in the index page. Using a proper tree inside the index page would improve the CPU usage of the index lookups. One detail that would need to be sorted out is the cache eviction policy. I don't know if it is best to evict whole 256kB pages, or to evict 8kB pages. Probably the former, which would involve quite a bit of change to the shared memory cache. I can see the cache efficiency decreasing as a result of this, which is the only disadvantage I can see. This sort of thing has been fairly well researched at an academic level, but has not been implemented in that many real world situations. I would encourage its use in Postgres. Matthew -- Failure is not an option. It comes bundled with your Microsoft product. -- Ferenc Mantfeld -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Parallel queries for a web-application |performance testing
On Wed, 16 Jun 2010, Balkrishna Sharma wrote: Hello,I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel web-users interacting with the database (read/write)I wish to do performance testing of 1000 simultaneous read/write to the database. When you set up a server that has high throughput requirements, the last thing you want to do is use it in a manner that cripples its throughput. Don't try and have 1000 parallel Postgres backends - it will process those queries slower than the optimal setup. You should aim to have approximately ((2 * cpu core count) + effective spindle count) number of backends, as that is the point at which throughput is the greatest. You can use pgbouncer to achieve this. I can do a simple unix script on the postgres server and have parallel updates fired for example with an ampersand at the end. Example: echo '\timing \\update DAPP.emp_data set f1 = 123where emp_id =0;' | psql test1 postgres|grep Time:|cut -d' ' -f2- /home/user/Documents/temp/logs/$NUM.txt pid1=$! echo '\timing \\update DAPP.emp_data set f1 = 123 where emp_id =2;' | psql test1 postgres|grep Time:|cut -d' ' -f2- /home/user/Documents/temp/logs/$NUM.txt pid2=$! echo '\timing \\update DAPP.emp_data set f1 = 123 where emp_id =4;' | psql test1 postgres|grep Time:|cut -d' ' -f2- /home/user/Documents/temp/logs/$NUM.txt pid3=$! . Don't do that. The overhead of starting up an echo, a psql, and a grep will limit the rate at which these queries can be fired at Postgres, and consume quite a lot of CPU. Use a proper benchmarking tool, possibly on a different server. Also, you should be using a different username to postgres - that one is kind of reserved for superuser operations. Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] B-Heaps
On Mon, 14 Jun 2010, Eliot Gable wrote: Just curious if this would apply to PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327 Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. It's a pretty obvious performance improvement really - the principle is that when you do have to fetch a page from a slower medium, you may as well make it count for a lot. Lots of research has already been done on this - the paper linked above is rather behind the times. However, AFAIK, Postgres has not implemented this in any of its indexing systems. Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query performance
On Fri, 11 Jun 2010, Kenneth Marshall wrote: If you check the archives, you will see that this is not easy to do because of the effects of caching. Indeed. If you were to take the value at completely face value, a modern hard drive is capable of transferring sequential pages somewhere between 40 and 100 times faster than random pages, depending on the drive. However, caches tend to favour index scans much more than sequential scans, so using a value between 40 and 100 would discourage Postgres from using indexes when they are really the most appropriate option. Matthew -- A. Top Posters Q. What's the most annoying thing in the world? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
On Thu, 3 Jun 2010, Anj Adu wrote: http://explain.depesz.com/s/kHa I'm interested in why the two partitions dev4_act_dy_fact and dev4_act_dy_fact_2010_05_t3 are treated so differently. I'm guessing that the former is the parent and the latter the child table? When accessing the parent table, Postgres is able to use a bitmap AND index scan, because it has the two indexes dev4_act_dy_dm_nd_indx and dev4_act_dy_dm_cd_indx. Do the child tables have a similar index setup? Incidentally, you could get even better than a bitmap AND index scan by creating an index on (node_id, thedate) on each table. random_page_cost=1 I agree with Tomas that this is rarely a useful setting. Matthew -- You can configure Windows, but don't ask me how. -- Bill Gates -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Weird XFS WAL problem
On Thu, 3 Jun 2010, Greg Smith wrote: And it's also quite reasonable for a RAID controller to respond to that flush the whole cache call by flushing its cache. Remember that the RAID controller is presenting itself to the OS as a large disc, and hiding the individual discs from the OS. Why should the OS care what has actually happened to the individual discs' caches, as long as that flush the whole cache command guarantees that the data is persistent. Taking the RAID array as a whole, that happens when the data hits the write-back cache. The only circumstance where you actually need to flush the data to the individual discs is when you need to take that disc away somewhere else and read it on another system. That's quite a rare use case for a RAID array (http://thedailywtf.com/Articles/RAIDing_Disks.aspx notwithstanding). If the controller had some logic that said it's OK to not flush the cache when that call comes in if my battery is working fine, that would make this whole problem go away. The only place this can be properly sorted is the RAID controller. Anywhere else would be crazy. Matthew -- To err is human; to really louse things up requires root privileges. -- Alexander Pope, slightly paraphrased -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present
On Wed, 2 Jun 2010, Jori Jovanovich wrote: (2) Making the query faster by making the string match LESS specific (odd, seems like it should be MORE) No, that's the way round it should be. The LIMIT changes it all. Consider if you have a huge table, and half of the entries match your WHERE clause. To fetch the ORDER BY ... LIMIT 20 using an index scan would involve accessing only on average 40 entries from the table referenced by the index. Therefore, the index is quick. However, consider a huge table that only has twenty matching entries. The index scan would need to touch every single row in the table to return the matching rows, so a sequential scan, filter, and sort would be much faster. Of course, if you had an index capable of answering the WHERE clause, that would be even better for that case. Matthew -- Don't criticise a man until you have walked a mile in his shoes; and if you do at least he will be a mile behind you and bare footed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Weird XFS WAL problem
On Thu, 3 Jun 2010, Craig James wrote: Also, are barriers *on* on the RAID1 mount and off on the RAID10 one? It was the barriers. barrier=1 isn't just a bad idea on ext4, it's a disaster. This worries me a little. Does your array have a battery-backed cache? If so, then it should be fast regardless of barriers (although barriers may make a small difference). If it does not, then it is likely that the fast speed you are seeing with barriers off is unsafe. There should be no just missed the sector going past for write problem ever with a battery-backed cache. Matthew -- There once was a limerick .sig that really was not very big It was going quite fine Till it reached the fourth line -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Overusing 1 CPU
On Wed, 2 Jun 2010, Mozzi wrote: This box is basically adle @ the moment as it is still in testing yet top shows high usage on just 1 of the cores. First port of call: What process is using the CPU? Run top on a fairly wide terminal and use the c button to show the full command line. Matthew -- Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Kernighan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?
On Sun, 23 May 2010, David Jarvis wrote: The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were causing the planner to do a full table scan, which is correct. I wonder if you might see some benefit from CLUSTERing the tables on the index. Matthew -- And the lexer will say Oh look, there's a null string. Oooh, there's another. And another., and will fall over spectacularly when it realises there are actually rather a lot. - Computer Science Lecturer (edited) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL
On Fri, 28 May 2010, Merlin Moncure wrote: At best, if you are a ninja with the marginally documented backend api, you will create code that goes about as fast as your pl/pgsql function for 10 times the amount of input work, unless there are heavy amounts of 'other than sql' code in your function. The reason to write C in the backend is: *) Interface w/3rd party libraries w/C linkage *) Do things that are illegal in regular SQL (write files, etc) *) Make custom types The major case I found when writing pl/pgsql was when trying to build arrays row by row. AFAIK when I tried it, adding a row to an array caused the whole array to be copied, which put a bit of a damper on performance. Matthew -- The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary. - James Nicoll -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL
On Tue, 1 Jun 2010, Stephen Frost wrote: * Matthew Wakeling (matt...@flymine.org) wrote: The major case I found when writing pl/pgsql was when trying to build arrays row by row. AFAIK when I tried it, adding a row to an array caused the whole array to be copied, which put a bit of a damper on performance. Using the built-ins now available in 8.4 (array_agg), that copying doesn't happen any more. Thanks. I had wondered if that had been improved. Matthew -- Our riverbanks and seashores have a beauty all can share, provided there's at least one boot, three treadless tyres, a half-eaten pork pie, some oil drums, an old felt hat, a lorry-load of tar blocks, and a broken bedstead there. -- Flanders and Swann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?
On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-year method would seem sensible. Regarding the leap year problem, you might consider creating a modified day of year field, which always assumes that the year contains a leap day. Then a given number always resolves to a given date, regardless of year. If you then partition (or index) on that field, then you may get a benefit. In this case, partitioning is only really useful when you are going to be forced to do seq scans. If you can get a suitably selective index, in the case where you are selecting a small proportion of the data, then I would concentrate on getting the index right, rather than the partition, and maybe even not do partitioning. Matthew -- Trying to write a program that can't be written is... well, it can be an enormous amount of fun! -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?
Regarding the leap year problem, you might consider creating a modified day of year field, which always assumes that the year contains a leap day. Then a given number always resolves to a given date, regardless of year. If you then partition (or index) on that field, then you may get a benefit. On Fri, 21 May 2010, Yeb Havinga wrote: Shouldn't it be just the other way around - assume all years are non leap years for the doy part field to be indexed. The mapping doesn't matter massively, as long as all days of the year can be mapped uniquely onto a number, and the numbers are sequential. Your suggestion does not satisfy the first of those two requirements. If you assume that all yeasr are leap years, then you merely skip a number in the middle of the year, which isn't a problem when you want to check for days between two bounds. However, if you assume non leap year, then there is no representation for the 29th of February, so not all data points will have a representative number to insert into the database. Matthew -- No, C++ isn't equal to D. 'C' is undeclared, so we assume it's an int, with a default value of zero. Hence, C++ should really be called 1. -- met24, commenting on the quote C++ -- shouldn't it be called D? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] prepared query performs much worse than regular query
On Fri, 21 May 2010, Richard Yen wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the values were available. The whole idea is to avoid the planning cost each time the query is executed, but if your data is unusual it can result in worse plans. Matthew -- Existence is a convenient concept to designate all of the files that an executable program can potentially process. -- Fortran77 standard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?
On Wed, 19 May 2010, David Jarvis wrote: extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND That portion of the WHERE clause cannot use an index on m.taken. Postgres does not look inside functions (like extract) to see if something indexable is present. To get an index to work, you could create an index on (extract(YEAR FROM m.taken)). Matthew -- Here we go - the Fairy Godmother redundancy proof. -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?
On Thu, 20 May 2010, David Jarvis wrote: I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: Yes, because there is only one station_id selected. That's exactly what an index is for. Then combined the selection of the station: The station index is no longer used, resulting in full table scans: Nested Loop (cost=0.00..994.94 rows=4046 width=4) Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double precision), 2::double precision) + (cos((radians(((c.latitude_decimal + s.latitude_decimal))::double precision) / 2::double precision)) * pow(radians(((c.longitude_decimal - s.longitude_decimal))::double precision), 2::double precision) = 25::double precision) - Index Scan using city_pkey1 on city c (cost=0.00..6.27 rows=1 width=16) Index Cond: (id = 5182) - Seq Scan on station s (cost=0.00..321.08 rows=12138 width=20) Filter: ((s.elevation = 0) AND (s.elevation = 3000)) I get a set of 78 rows returned in very little time. (An EXPLAIN ANALYSE would be better here). Look at the expected number of stations returned. It expects 4046 which is a large proportion of the available stations. It therefore expects to have to touch a large proportion of the measurement table, therefore it thinks that it will be fastest to do a seq scan. In actual fact, for 78 stations, the index would be faster, but for 4046 it wouldn't. If you will be querying by season quite regularly, had you considered partitioning by season? Matthew -- Geography is going places. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] merge join killing performance
On Wed, 19 May 2010, Scott Marlowe wrote: It's apparently estimating (wrongly) that the merge join won't have to scan very much of files before it can stop because it finds an eid value larger than any eid in the other table. So the issue here is an inexact stats value for the max eid. I wandered if it could be something like that, but I rejected that idea, as it obviously wasn't the real world case, and statistics should at least get that right, if they are up to date. I changed stats target to 1000 for that field and still get the bad plan. What do the stats say the max values are? Matthew -- Nog: Look! They've made me into an ensign! O'Brien: I didn't know things were going so badly. Nog: Frightening, isn't it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] merge join killing performance
On Tue, 18 May 2010, Scott Marlowe wrote: Aggregate (cost=902.41..902.42 rows=1 width=4) - Merge Join (cost=869.97..902.40 rows=1 width=4) Merge Cond: (f.eid = ev.eid) - Index Scan using files_eid_idx on files f (cost=0.00..157830.39 rows=3769434 width=8) Okay, that's weird. How is the cost of the merge join only 902, when the cost of one of the branches 157830, when there is no LIMIT? Are the statistics up to date? Matthew -- As you approach the airport, you see a sign saying Beware - low flying airplanes. There's not a lot you can do about that. Take your hat off? -- Michael Flanders -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner not using column limit specified for one column for another column equal to first
On Sat, 17 Apr 2010, Віталій Тимчишин wrote: As of making planner more clever, may be it is possible to introduce division on fast queries and long queries, so that if after fast planning cost is greater then some configurable threshold, advanced planning techniques (or settings) are used. As far as I have seen in this list, many techniques are not used simply because they are too complex and could make planning take too much time for really fast queries, but they are vital for long ones. +1. That's definitely a good idea in my view. The query optimiser I wrote (which sits on top of Postgres and makes use of materialised views to speed up queries) uses a similar approach - it expends effort proportional to the estimated cost of the query, as reported by EXPLAIN. Matthew -- To most people, solutions mean finding the answers. But to chemists, solutions are things that are still all mixed up. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT causes planner to do Index Scan using a less optimal index
On Wed, 7 Apr 2010, sherry.ctr@faa.gov wrote: Please just let me know if Postgres can do this kind of index or not. create index idx1 on tb1(col1, col2) Then later we can find it is useful or useless. Have you tried it? Grzegorz Jaśkiewicz gryz...@gmail.com wrote: something like this: create index idx1 on tb1(col1, col2); yup :) For those of you who are not native English speakers, Yup is a synonym for Yes. Matthew -- Richards' Laws of Data Security: 1. Don't buy a computer. 2. If you must buy a computer, don't turn it on. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] experiments in query optimization
On Tue, 30 Mar 2010, Faheem Mitha wrote: work_mem = 1 GB (see diag.{tex/pdf}). Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low You're trying to keep memory usage low, but you have work_mem set to 1GB? Matthew -- Prove to thyself that all circuits that radiateth and upon which thou worketh are grounded, lest they lift thee to high-frequency potential and cause thee to radiate also. -- The Ten Commandments of Electronics -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance regarding LIKE searches
On Mon, 29 Mar 2010, randa...@bioinfo.wsu.edu wrote: WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ... I'm sure you noticed that this is never going to return any rows? Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer showing wrong rows in plan
On Mon, 29 Mar 2010, Tadipathri Raghu wrote: As per the documentation, one page is 8kb, when i create a table with int as one column its 4 bytes. If i insert 2000 rows, it should be in one page only as its 8kb, but its extending vastly as expected. Example shown below, taking the previous example table test with one column. There is more to a row than just the single int column. The space used by a column will include a column start marker (data length), transaction ids, hint bits, an oid, a description of the types of the columns, and finally your data columns. That takes a bit more space. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] memory question
On Wed, 24 Mar 2010, Campbell, Lance wrote: I have 24 Gig of memory on my server... Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. You organisation probably spent more money getting your server manager to investigate how much RAM you need and scaring you about wasting resources, than it would cost to just slap 24GB in the machine. 24GB is the least amount of RAM I would consider putting in a new server nowadays. It's so cheap. Matthew -- Lord grant me patience, and I want it NOW! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes
On Fri, 19 Mar 2010, Stephen Frost wrote: ...it has to go to an external on-disk sort (see later on, and how to fix that). This was covered on this list a few months ago, in http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php There seemed to be some consensus that allowing a materialise in front of an index scan might have been a good change. Was there any movement on this front? Limit (cost=66681.50..66681.50 rows=1 width=139) (actual time=7413.489..7413.489 rows=1 loops=1) - Merge Anti Join (cost=40520.17..66681.50 rows=367793 width=139) (actual time=3705.078..7344.256 rows=101 loops=1) Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id = f2.user_id)) - Index Scan using user_ref on friends f1 (cost=0.00..26097.86 rows=2818347 width=139) (actual time=0.093..1222.592 rows=1917360 loops=1) - Materialize (cost=40520.17..40555.40 rows=2818347 width=8) (actual time=3704.977..5043.347 rows=1990148 loops=1) - Sort (cost=40520.17..40527.21 rows=2818347 width=8) (actual time=3704.970..4710.703 rows=1990148 loops=1) Sort Key: f2.ref_id, f2.user_id Sort Method: external merge Disk: 49576kB - Seq Scan on friends f2 (cost=0.00..18143.18 rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1) Total runtime: 7422.516 ms If you had an index on ref_id,user_id (as well as the one on user_id,ref_id), it'd probably be able to do in-order index traversals on both and be really fast... But then updates would be more expensive, of course, since it'd have more indexes to maintain. That isn't necessarily so, until the issue referred to in the above linked messages is resolved. It depends. Matthew -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) -- Linus Torvalds -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient. Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it. I'd also love to dive into the core gist code, but am rather daunted by it. I believe that there is something there that is taking more time than I can account for. The indexing algorithm itself is good. Matthew -- The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary. - James Nicoll -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Mon, 22 Mar 2010, Yeb Havinga wrote: Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. Yeah, I only did in-memory fitting tests and wondered about increased io's. However I bet that even for bigger than ram db's, the benefit of having to fan out to less pages still outweighs the over-general non leaf nodes and might still result in less disk io's. I redid some earlier benchmarking with other datatypes with a 1kB block size and also multicolumn gist and the multicolumn variant had an ever greater benefit than the single column indexes, both equality and range scans. (Like execution times down to 20% of original). If gist is important to you, I really recommend doing a test with 1kB blocks. Purely from a disc seek count point of view, assuming an infinite CPU speed and infinite disc transfer rate, the larger the index pages the better. The number of seeks per fetch will be equivalent to the depth of the tree. If you take disc transfer rate into account, the break-even point is when you spend an equal time transferring as seeking, which places the page size around 500kB on a modern disc, assuming RAID stripe alignment doesn't make that into two seeks instead of one. However, for efficient CPU usage, the ideal page size for a tree index is much smaller - between two and ten entries, depending on the type of the data. There may be some mileage in reorganising indexes into a two-level system. That is, have an index format where the page size is 512kB or similar, but each page is internally a CPU-efficient tree itself. However, this is beyond the scope of the problem of speeding up gist. Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump far too slow
On Sun, 14 Mar 2010, David Newall wrote: nohup time pg_dump -f database.dmp -Z9 database I presumed pg_dump was CPU-bound because of gzip compression, but a test I ran makes that seem unlikely... There was some discussion about this a few months ago at http://archives.postgresql.org/pgsql-performance/2009-07/msg00348.php It seems that getting pg_dump to do the compression is a fair amount slower than piping the plain format dump straight through gzip. You get a bit more parallelism that way too. Matthew -- I'm always interested when [cold callers] try to flog conservatories. Anyone who can actually attach a conservatory to a fourth floor flat stands a marginally better than average chance of winning my custom. (Seen on Usenet) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting bytea, autovacuum, and 8.2/8.4 differences
On Mon, 15 Mar 2010, Tom Lane wrote: For an example like this one, you have to keep in mind that the toast-table rows for the large bytea value have to be marked deleted, too. Also, since I/O happens in units of pages, the I/O volume to delete a tuple is just as much as the I/O to create it. (The WAL entry for deletion might be smaller, but that's all.) So it is entirely unsurprising that DELETE FROM foo is about as expensive as filling the table initially. If deleting a whole table is significant for you performance-wise, you might look into using TRUNCATE instead. What are the implications of using TRUNCATE on a table that has TOASTed data? Is TOAST all stored in one single table, or is it split up by owner table/column name? Might you still end up with a normal delete operation on the TOAST table when performing a TRUNCATE on the owner table? Matthew -- sed -e '/^[when][coders]/!d;/^...[discover].$/d;/^..[real].[code]$/!d ' `locate dict/words` -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GiST index performance
On Thu, 25 Feb 2010, Bruce Momjian wrote: Was there every any conclusion on this issue? Not really. Comments inline: Matthew Wakeling wrote: Revisiting the thread a month back or so, I'm still investigating performance problems with GiST indexes in Postgres. Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd like to clarify the contrib/seg issue. Contrib/seg is vulnerable to pathological behaviour which is fixed by my second patch, which can be viewed as complete. Contrib/cube, being multi-dimensional, is not affected to any significant degree, so should not need alteration. This issue is addressed by my patch, which AFAIK noone has reviewed. However, that patch was derived from a patch that I applied to bioseg, which is itself a derivative of seg. This patch works very well indeed, and gave an approximate 100 times speed improvement in the one test I ran. So you could say that the sister patch of the one I submitted is tried and tested in production. A second quite distinct issue is the general performance of GiST indexes which is also mentioned in the old thread linked from Open Items. For that, we have a test case at http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for btree_gist indexes. I have a similar example with the bioseg GiST index. I have completely reimplemented the same algorithms in Java for algorithm investigation and instrumentation purposes, and it runs about a hundred times faster than in Postgres. I think this is a problem, and I'm willing to do some investigation to try and solve it. I have not made any progress on this issue. I think Oleg and Teodor would be better placed working it out. All I can say is that I implemented the exact same indexing algorithm in Java, and it performed 100 times faster than Postgres. Now, Postgres has to do a lot of additional work, like mapping the index onto disc, locking pages, and abstracting to plugin user functions, so I would expect some difference - I'm not sure 100 times is reasonable though. I tried to do some profiling, but couldn't see any one section of code that was taking too much time. Not sure what I can further do. Matthew -- Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems. -- Jamie Zawinski -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Dell PERC H700/H800
Just a heads up - apparently the more recent Dell RAID controllers will no longer recognise hard discs that weren't sold through Dell. http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/ As one of the comments points out, that kind of makes them no longer SATA or SAS compatible, and they shouldn't be allowed to use those acronyms any more. Matthew -- An optimist sees the glass as half full, a pessimist as half empty, and an engineer as having redundant storage capacity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes
On Thu, 4 Feb 2010, Amitabh Kant wrote: On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote: If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog directory to the new array. Can't do anything about this server now, but would surely keep in mind before upgrading other servers. Would you recommend the same speed drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS or even SATA II)? The performance requirements for the WAL are significantly lower than for the main database. This is for two reasons - firstly the WAL is write-only, and has no other activity. The WAL only gets read again in the event of a crash. Secondly, writes to the WAL are sequential writes, which is the fastest mode of operation for a disc, whereas the main database discs will have to handle random access. The main thing you need to make sure of is that the WAL is on a disc system that has a battery-backed up cache. That way, it will be able to handle the high rate of fsyncs that the WAL generates, and the cache will convert that into a simple sequential write. Otherwise, you will be limited to one fsync every 5ms (or whatever the access speed of your WAL discs is). If you make sure of that, then there is no reason to get expensive fast discs for the WAL at all (assuming they are expensive enough to not lie about flushing writes properly). Matthew -- So, given 'D' is undeclared too, with a default of zero, C++ is equal to D. mnw21, commenting on the Surely the value of C++ is zero, but C is now 1 response to No, C++ isn't equal to D. 'C' is undeclared [...] C++ should really be called 1 response to C++ -- shouldn't it be called D? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] System overload / context switching / oom, 8.3
On Tue, 2 Feb 2010, Rob wrote: pg 8.3.9, Debian Etch, 8gb ram, quadcore xeon, megaraid (more details at end) ~240 active databases, 800+ db connections via tcp. Linux 2.6.18-6-686-bigmem #1 SMP Thu Nov 5 17:30:05 UTC 2009 i686 GNU/Linux (Debian Etch) 8 MB RAM 4 Quad Core Intel(R) Xeon(R) CPU E5440 @ 2.83GHz stepping 06 My advice? 1. Switch to 64-bit operating system and Postgres. Debian provides that, and it works a charm. You have a 64-bit system, so why not use it? 2. Buy more RAM. Think about it - you have 800 individual processes running on your box, and they will all want their own process space. To be honest, I'm impressed that the current machine works at all. You can get an idea of how much RAM you might need by multiplying the number of connections by (work_mem + about 3MB), and add on shared_buffers. So even when the system is idle you're currently burning 3200MB just sustaining 800 processes - more if they are actually doing something. 3. Try to reduce the number of connections to the database server. 4. Think about your work_mem. Finding the correct value for you is going to be a matter of testing. Smaller values will result in large queries running slowly, but have the danger of driving the system to swap and OOM. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark shows very slow bulk delete
On Wed, 27 Jan 2010, Thom Brown wrote: Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL, and while PostgreSQL is generally faster, I noticed the bulk delete was very slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html Is this normal? On the contrary, TRUNCATE TABLE is really rather fast. Seriously, the Postgres developers, when designing the system, decided on a database layout that was optimised for the most common cases. Bulk deletion of data is not really that common an operation, unless you are deleting whole categories of data, where setting up partitioning and deleting whole partitions would be sensible. Other complications are that the server has to maintain concurrent integrity - that is, another transaction must be able to see either none of the changes or all of them. As a consequence of this, Postgres needs to do a sequential scan through the table and mark the rows for deletion in the transaction, before flipping the transaction committed status and cleaning up afterwards. I'd be interested in how mysql manages to delete a whole load of rows in 0.02 seconds. How many rows is that? (Reading in the comments, I saw this: The slow times for Postgresql Bulk Modify/Bulk Delete can be explained by foreign key references to the updates table. I'm not sure that fully explains it though, unless there are basically zero rows being deleted - it's hardly bulk then, is it?) Matthew -- People who love sausages, respect the law, and work with IT standards shouldn't watch any of them being made. -- Peter Gutmann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] test send (recommended by Dave Page)
On Wed, 27 Jan 2010, Mark Steben wrote: Subject: [PERFORM] test send (recommended by Dave Page) Hi all - sorry to create additional email 'noise' But I've been trying to post a rather long query to The pgsql-performance user list. Dave thought That it might have been bounced due to the length And suggested I send a short 'blast' If this works I'll send a shortened version of my query later. Whatever you do, don't try to send an email to the list with the word help in the subject. The mailing list software will silently throw away your email. Helpful, for a help mailing list. Matthew -- The early bird gets the worm, but the second mouse gets the cheese. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?
On Wed, 27 Jan 2010, Віталій Тимчишин wrote: How about SELECT SUM (case when id 120 and id 121 then 1 end) from tbl_tracker; That is very interesting. * All the functions should be noop for null input Alas, not true for COUNT(*), AVG(), etc. Matthew -- An optimist sees the glass as half full, a pessimist as half empty, and an engineer as having redundant storage capacity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] splitting data into multiple tables
On Mon, 25 Jan 2010, Viji V Nair wrote: I think this wont help that much if you have a single machine. Partition the table and keep the data in different nodes. Have a look at the tools like pgpool.II So partitioning. You have three choices: 1. Use a single table 2. Partition the table on the same server 3. Partition the data across multiple servers. This is in increasing order of complexity. There will probably be no problem at all with option 1. The only problem arises if you run a query that performs a full sequential scan of the entire table, which would obviously take a while. If your queries are indexable, then option 1 is almost certainly the best option. Option 2 adds complexity in the Postgres server. You will need to partition your tables in a logical manner - that is, there needs to be some difference between rows in table a compared to rows in table b. This means that the partitioning will in effect be a little like indexing. You do not want to have too many partitions. The advantage is that if a query requires a full sequential scan, then there is the possibility of skipping some of the partitions, although there is some complexity involved in getting this to work correctly. In a lot of cases, partitioning will make queries slower by confusing the planner. Option 3 is only useful when you have a real performance problem with long-running queries (partitioning the data across servers) or with very large numbers of queries (duplicating the data across servers). It also adds much complexity. It is fairly simple to run a filter these results from the table queries across multiple servers, but if that was all you were doing, you may as well use an index instead. It becomes impossible to perform proper cross-referencing queries without some very clever software (because not all the data is available on the server), which will probably be hard to manage and slow down the execution anyway. My recommendation would be to stick with a single table unless you have a real need to partition. Matthew -- Note: some countries impose serious penalties for a conspiracy to overthrow the political system. THIS DOES NOT FIX THE VULNERABILITY. -- http://seclists.org/vulnwatch/2003/q2/0002.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] splitting data into multiple tables
On Mon, 25 Jan 2010, nair rajiv wrote: I am working on a project that will take out structured content from wikipedia and put it in our database... there is a table which will approximately have 5 crore entries after data harvesting. Have you asked the Wikimedia Foundation if they mind you consuming that much of their bandwidth, or even if there are copyright issues involved in grabbing that much of their data? (The other problem with using the word crore is that although it may mean 1000 in a few countries, it could also mean 50.) Matthew -- Of course it's your fault. Everything here's your fault - it says so in your contract.- Quark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?
On Tue, 26 Jan 2010, Richard Neill wrote: SELECT SUM (case when id 120 and id 121 then 1 else 0 end) from tbl_tracker; Explain shows that this does a sequential scan. I'd defer to Tom on this one, but really, for Postgres to work this out, it would have to peer deep into the mysterious SUM function, and realise that the number zero is a noop. I suppose it would be possible, but you'd have to define noops for each of the different possible functions, *and* make the planner clever enough to spot the noop-matching number in the else and convert the WHEN into a WHERE. In my mind, this is quite a lot of work for the planner to do to solve this one. That translates into quite a lot of work for some poor programmer to do to achieve it. If you have the money, then hire someone to do it! Matthew -- I don't want the truth. I want something I can tell parliament! -- Rt. Hon. Jim Hacker MP -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Mon, 25 Jan 2010, Richard Huxton wrote: OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). Agreed. One query is faster than the other because it has to do an eighth the amount of work. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sql result b where condition
On Mon, 25 Jan 2010, A. Kretschmer wrote: In response to ramasubramanian : Please, create a new mail for a new topic and don't hijack other threads. Even more so - this isn't probably the right mailing list for generic sql help questions. select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10 rows will be returned, for the second condition (orig_salary=1234 ) there are 20 rows will be returned. The order of display should be The first 10 rows then next 20 rows. select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where (orig_salary=1234 ) order by my_order. Or just: select ENAME,ORIG_SALARY from employee where (ename='Tom' and orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC as there is going to be only two values for orig_salary. Matthew -- The early bird gets the worm. If you want something else for breakfast, get up later. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Fri, 22 Jan 2010, Tory M Blue wrote: But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound. Okay, so you have two differing cases. Show us the EXPLAIN ANALYSE for both of them, and we will see what the difference is. Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like ha-ha-ha, but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On Wed, 20 Jan 2010, Greg Smith wrote: Basically, to an extent, that's right. However, when you get 16 drives or more into a system, then it starts being an issue. I guess if I test a system with *only* 16 drives in it one day, maybe I'll find out. *Curious* What sorts of systems have you tried so far? As the graph I just sent shows, the four schedulers are pretty-much identical in performance, until you start saturating it with simultaneous requests. CFQ levels out at a performance a little lower than the other three. Seriously though, there is some difference between a completely synthetic test like you noted issues with here, and anything you can see when running the database. Granted, this test is rather synthetic. It is testing the rather unusual case of lots of simultaneous random small requests - more simultaneous requests than we advise people to run backends on a server. You'd probably need to get a RAID array a whole lot bigger than 16 drives to have a normal workload capable of demonstrating the performance difference, and even that isn't particularly major. Would be interesting research if anyone has a 200-spindle RAID array hanging around somewhere. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
On Thu, 21 Jan 2010, Greg Smith wrote: In the attachement you'll find 2 screenshots perfmon34.png and perfmon35.png (I hope 2x14 kb is o.k. for the mailing list). I don't think they made it to the list? No, it seems that no emails with image attachments ever make it through the list server. Someone mentioned something about banning the guy who set the list up from the internet or something. http://archives.postgresql.org/pgsql-performance/2008-01/msg00290.php Matthew -- Bashir: The point is, if you lie all the time, nobody will believe you, even when you're telling the truth. (RE: The boy who cried wolf) Garak: Are you sure that's the point, Doctor? Bashir: What else could it be?-- Star Trek DS9 Garak: That you should never tell the same lie twice. -- Improbable Cause -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On Fri, 15 Jan 2010, Greg Smith wrote: It seems to me that CFQ is simply bandwidth limited by the extra processing it has to perform. I'm curious what you are doing when you see this. 16 disc 15kRPM RAID0, when using fadvise with more than 100 simultaneous 8kB random requests. I sent an email to the mailing list on 29 Jan 2008, but it got thrown away by the mailing list spam filter because it had an image in it (the graph showing interesting information). Gregory Stark replied to it in http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php I was using his synthetic test case program. My theory has been that the extra processing it has to perform you describe just doesn't matter in the context of a fast system where physical I/O is always the bottleneck. Basically, to an extent, that's right. However, when you get 16 drives or more into a system, then it starts being an issue. Matthew -- For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On Thu, 14 Jan 2010, Greg Smith wrote: Andy Colson wrote: So if there is very little io, or if there is way way too much, then the scheduler really doesn't matter. So there is a slim middle ground where the io is within a small percent of the HD capacity where the scheduler might make a difference? That's basically how I see it. There seem to be people who run into workloads in the middle ground where the scheduler makes a world of difference. I've never seen one myself, and suspect that some of the reports of deadline being a big improvement just relate to some buginess in the default CFQ implementation that I just haven't encountered. That's the perception I get. CFQ is the default scheduler, but in most systems I have seen, it performs worse than the other three schedulers, all of which seem to have identical performance. I would avoid anticipatory on a RAID array though. It seems to me that CFQ is simply bandwidth limited by the extra processing it has to perform. Matthew -- Experience is what allows you to recognise a mistake the second time you make it. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
On Thu, 14 Jan 2010, fka...@googlemail.com wrote: The data needs to be written first to the WAL, in order to provide crash-safety. So you're actually writing 1600MB, not 800. I understand. So the actual throughput is 32MB/s which is closer to 43 MB/s, of course. Can I verify that by temporarily disabling WAL writes completely and see if the thoughput is then doubled? There isn't a magic setting in Postgres to disable the WAL. That would be far too tempting, and an easy way to break the database. However, what you can do is to insert the data into the table in the same transaction as creating the table. Then Postgres knows that no other transactions can see the table, so it doesn't need to be so careful. Unfortunately, I don't think even this strategy will work in your case, as you will be writing to the large object table, which already exists. Could someone who knows confirm this? Matthew -- Let's say I go into a field and I hear baa baa baa. Now, how do I work out whether that was baa followed by baa baa, or if it was baa baa followed by baa? - Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] new server I/O setup
On Thu, 14 Jan 2010, Scott Marlowe wrote: I've just received this new server: 1 x XEON 5520 Quad Core w/ HT 8 GB RAM 1066 MHz 16 x SATA II Seagate Barracuda 7200.12 3ware 9650SE w/ 256MB BBU 2 discs in RAID 1 for OS + pg_xlog partitioned with ext2. 12 discs in RAID 10 for postgres data, sole partition with ext3. 2 spares I think your first choice is right. I use the same basic setup with 147G 15k5 SAS seagate drives and the pg_xlog / OS partition is almost never close to the same level of utilization, according to iostat, as the main 12 disk RAID-10 array is. We may have to buy a 16 disk array to keep up with load, and it would be all main data storage, and our pg_xlog main drive pair would be just fine. The benefits of splitting off a couple of discs for WAL are dubious given the BBU cache, given that the cache will convert the frequent fsyncs to sequential writes anyway. My advice would be to test the difference. If the bottleneck is random writes on the 12-disc array, then it may actually help more to improve that to a 14-disc array instead. I'd also question whether you need two hot spares, with RAID-10. Obviously that's a judgement call only you can make, but you could consider whether it is sufficient to just have a spare disc sitting on a shelf next to the server rather than using up a slot in the server. Depends on how quickly you can get to the server on failure, and how important the data is. Matthew -- In the beginning was the word, and the word was unsigned, and the main() {} was without form and void... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
On Thu, 14 Jan 2010, fka...@googlemail.com wrote: Nevertheless: If your explanation covers all what can be said about it then replacing the hard disk by a faster one should increase the performance here (I'll try to check that out). Probably. However, it is worth you running the test again, and looking at how busy the CPU on the machine is. The disc may be the bottleneck, or the CPU may be the bottleneck. Matthew -- Take care that thou useth the proper method when thou taketh the measure of high-voltage circuits so that thou doth not incinerate both thee and the meter; for verily, though thou has no account number and can be easily replaced, the meter doth have one, and as a consequence, bringeth much woe upon the Supply Department. -- The Ten Commandments of Electronics -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] new server I/O setup
On Fri, 15 Jan 2010, Fernando Hevia wrote: I was wondering if disabling the bbu cache on the RAID 1 array would make any difference. All 256MB would be available for the random I/O on the RAID 10. That would be pretty disastrous, to be honest. The benefit of the cache is not only that it smooths random access, but it also accelerates fsync. The whole point of the WAL disc is for it to be able to accept lots of fsyncs very quickly, and it can't do that without its BBU cache. Matthew -- Heat is work, and work's a curse. All the heat in the universe, it's going to cool down, because it can't increase, then there'll be no more work, and there'll be perfect peace. -- Michael Flanders -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an unused table kills my server
On Fri, 15 Jan 2010, Craig James wrote: That's the perception I get. CFQ is the default scheduler, but in most systems I have seen, it performs worse than the other three schedulers, all of which seem to have identical performance. I would avoid anticipatory on a RAID array though. I thought the best strategy for a good RAID controller was NOOP. Agreed. That's what we use here. My observation is though that noop is identical in performance to anticipatory and deadline. Theoretically, it should be faster. Matthew -- Take care that thou useth the proper method when thou taketh the measure of high-voltage circuits so that thou doth not incinerate both thee and the meter; for verily, though thou has no account number and can be easily replaced, the meter doth have one, and as a consequence, bringeth much woe upon the Supply Department. -- The Ten Commandments of Electronics -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?
On Thu, 14 Jan 2010, fka...@googlemail.com wrote: This takes about 50s, so, 800MB/50s = 16MB/s. However the harddisk (sata) could write 43 MB/s in the worst case! Why is write performance limited to 16 MB/s? Several reasons: The data needs to be written first to the WAL, in order to provide crash-safety. So you're actually writing 1600MB, not 800. Postgres needs to update a few other things on disc (indexes on the large object table maybe?), and needs to call fsync a couple of times. That'll add a bit of time. Your discs can't write 43MB/s in the *worst case* - the worst case is lots of little writes scattered over the disc, where it would be lucky to manage 1MB/s. Not all of the writes Postgres makes are sequential. A handy way of knowing how sequential the writes are is to listen to the disc as it writes - the clicking sounds are where it has to waste time moving the disc head from one part of the disc to another. Matthew -- No trees were killed in the sending of this message. However a large number of electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows
On Thu, 14 Jan 2010, tom wrote: i have a db-table data_measurand with about 6000 (60 Millions) rows and the following query takes about 20-30 seconds (with psql): mydb=# select count(*) from data_measurand; count -- 60846187 (1 row) Sounds pretty reasonable to me. Looking at your table, the rows are maybe 200 bytes wide? That's 12GB of data for Postgres to munch through. 30 seconds is really rather quick for that (400MB/s). What sort of RAID array is managing to give you that much? I use a software raid and LVM for Logical Volume Management. Filesystem is ext3 Ditch lvm. This is an FAQ. Counting the rows in a table is an expensive operation in Postgres. It can't be answered directly from an index. If you want, you can keep track of the number of rows yourself with triggers, but beware that this will slow down write access to the table. Matthew -- Nog: Look! They've made me into an ensign! O'Brien: I didn't know things were going so badly. Nog: Frightening, isn't it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance config help
On Mon, 11 Jan 2010, Bob Dusek wrote: How do I learn more about the actual lock contention in my db? Lock contention makes some sense. Each of the 256 requests are relatively similar. So, I don't doubt that lock contention could be an issue. I just don't know how to observe it or correct it. It seems like if we have processes that are contending for locks, there's not much we can do about it. To me: 1. This doesn't look like an IO bandwidth issue, as the database is small. 2. This doesn't look like a CPU speed issue, as usage is low. 3. This doesn't look like a memory bandwidth issue, as that would count as CPU active in top. 4. This doesn't look like a memory size problem either. So, what's left? It could be a network bandwidth problem, if your client is on a separate server. You haven't really given much detail about the nature of the queries, so it is difficult for us to tell if the size of the results means that you are maxing out your network. However, it doesn't sound like it is likely to me that this is the problem. It could be a client bottleneck problem - maybe your server is performing really well, but your client can't keep up. You may be able to determine this by switching on logging of long-running queries in Postgres, and comparing that with what your client says. Also, look at the resource usage on the client machine. It could be a lock contention problem. To me, this feels like the most likely. You say that the queries are similar. If you are reading and writing from a small set of the same objects in each of the transactions, then you will suffer badly from lock contention, as only one backend can be in a read-modify-write cycle on a given object at a time. We don't know enough about the data and queries to say whether this is the case. However, if you have a common object that every request touches (like a status line or something), then try to engineer that out of the system. Hope this helps. Synchronising forty processes around accessing a single object for high performance is really hard, and Postgres does it incredibly well, but it is still by far the best option to avoid contention completely if possible. -Kevin It'd really help us reading your emails if you could make sure that it is easy to distinguish your words from words you are quoting. It can be very confusing reading some of your emails, trying to remember which bits I have seen before written by someone else. This is one of the few lines that I know you didn't write - you're a Bob, not a Kevin. A few characters at the beginning of lines, which most mail readers will add automatically, make all the difference. Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance config help
On Tue, 12 Jan 2010, Bob Dusek wrote: Each of the concurrent clients does a series of selects, inserts, updates, and deletes. The requests would generally never update or delete the same rows in a table. However, the requests do generally write to the same tables. And, they are all reading from the same tables that they're writing to. For the inserts, I imagine they are blocking on access to the sequence that controls the primary keys for the insert tables. I'm going to have to bow out at this stage, and let someone else who knows more about what gets locked in a transaction help instead. The sequence may be significant, but I would have thought it would have to be something a bit bigger that is gumming up the works. I'm really sorry. I'm using gmail's interface. Actually, you weren't doing anything particularly wrong as it turns out. It is partly a case of alpine being too clever for its own good, just as Kevin pointed out. My mail reader is taking the most preferred mime alternative, which is the HTML version, and interpreting it to its best ability, which isn't very well. It is the email that says which alternative is preferred, by the way. I have just forced alpine to view the plain text version instead, and it is much better. I just saw the Plain Text formatter at the top of this compose message. But, if I convert it to Plain Text now, I may lose my portion of the message. I'll use the Plain Text when posting future messages. To be honest, that's always a good idea, although you didn't actually do wrong. I do know people whose spam filters immediately discard emails that contain a HTML alternative - that's taking it to the extreme! Matthew -- Beware of bugs in the above code; I have only proved it correct, not tried it. --Donald Knuth -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Choice of bitmap scan over index scan
On Mon, 11 Jan 2010, Mathieu De Zutter wrote: seq_page_cost = 0.1 random_page_cost = 0.1 So if this query usually does *not* hit the cache, it will be probably faster if I leave it like that? While testing a query I execute it that much that it's always getting into the cache. However, since other applications run on the same server, I think that infrequently used data gets flushed after a while, even if the DB could fit in the RAM. Postgres is being conservative. The plan it uses (bitmap index scan) will perform much better than an index scan when the data is not in the cache, by maybe an order of magnitude, depending on your hardware setup. The index scan may perform better at the moment, but the bitmap index scan is safer. Matthew -- Change is inevitable, except from vending machines. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Digesting explain analyze
On Thu, 7 Jan 2010, Jesper Krogh wrote: If disk seeks are killing you a kinda crazy idea would be to duplicate the table - clustering one by (id1) and the other one by an index on (id2) and unioning the results of each. That's doubling the disk space needs for the table. Is there any odds that this would benefit when the intitial table significantly exceeds available memory by itself? If the table already greatly exceeds the available RAM, then doubling the amount of data won't make a massive difference to performance. You're going to disc for everything anyway. Since each of these duplicates of the table will be clustered by the column you're querying it on, it should just take one seek in each table. Then your query could be something like select * from ( select * from t1 where id1=2067 order by evalue limit 100 union select * from t2 where id2=2067 order by evalue limit 100 ) as foo order by evalue limit 100; This is actually what I ended up with as the best performing query, just still on a single table, because without duplication I can add index and optimize this one by (id1,evalue) and (id2,evalue). It is still getting killed quite a lot by disk IO. So I guess I'm up to: You're kind of missing the point. The crucial step in the above suggestion is to cluster the table on the index. This will mean that all the rows that are fetched together are located together on disc, and you will no longer be killed by disc IO. 1) By better disk (I need to get an estimate how large it actually is going to get). Unless you cluster, you are still going to be limited by the rate at which the discs can seek. Postgres 8.4 has some improvements here for bitmap index scans if you have a RAID array, and set the effective_concurrency setting correctly. 2) Stick with one table, but make sure to have enough activity to get a large part of the index in the OS-cache anyway. (and add more memory if nessesary). In order to win here, you will need to make memory at least as big as the commonly-accessed parts of the database. This could get expensive. I didnt cluster it, since clustering locks everything. You can also test out the hypothesis by copying the table instead: CREATE NEW TABLE test1 AS SELECT * FROM table1 ORDER BY id1; Then create an index on id1, and test against that table. The copy will become out of date quickly, but it will allow you to see whether the performance benefit is worth it. It will also tell you how long a cluster will actually take, without actually locking anything. Matthew -- In the beginning was the word, and the word was unsigned, and the main() {} was without form and void... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Air-traffic benchmark
On Thu, 7 Jan 2010, Gurgel, Flavio wrote: If one single query execution had a step that brought a page to the buffercache, it's enough to increase another step speed and change the execution plan, since the data access in memory is (usually) faster then disk. Postgres does not change a query plan according to the shared_buffers setting. It does not anticipate one step contributing to another step in this way. It does however make use of the effective_cache_size setting to estimate this effect, and that does affect the planner. The use of the index over seqscan has to be tested. I don't agree in 50% gain, since simple integers stored on B-Tree have a huge possibility of beeing retrieved in the required order, and the discarded data will be discarder quickly too, so the gain has to be measured. I bet that an index scan will be a lot faster, but it's just a bet :) In a situation like this, the opposite will be true. If you were accessing a very small part of a table, say to order by a field with a small limit, then an index can be very useful by providing the results in the correct order. However, in this case, almost the entire table has to be read. Changing the order in which it is read will mean that the disc access is no longer sequential, which will slow things down, not speed them up. The Postgres planner isn't stupid (mostly), there is probably a good reason why it isn't using an index scan. The table is very wide, which is probably why the tested databases can deal with it faster than PG. You could try and narrow the table down (for instance: remove the Div* fields) to make the data more relational-like. In real life, speedups in this circumstances would probably be gained by normalizing the data to make the basic table smaller and easier to use with indexing. Ugh. I don't think so. That's why indexes were invented. PostgreSQL is smart enough to jump over columns using byte offsets. A better option for this table is to partition it in year (or year/month) chunks. Postgres (mostly) stores the columns for a row together with a row, so what you say is completely wrong. Postgres does not jump over columns using byte offsets in this way. The index references a row in a page on disc, and that page is fetched separately in order to retrieve the row. The expensive part is physically moving the disc head to the right part of the disc in order to fetch the correct page from the disc - jumping over columns will not help with that at all. Reducing the width of the table will greatly improve the performance of a sequential scan, as it will reduce the size of the table on disc, and therefore the time taken to read the entire table sequentially. Moreover, your suggestion of partitioning the table may not help much with this query. It will turn a single sequential scan into a UNION of many tables, which may be harder for the planner to plan. Also, for queries that access small parts of the table, indexes will help more than partitioning will. Partitioning will help most in the case where you want to summarise a single year's data. Not really otherwise. Matthew -- Q: What's the difference between ignorance and apathy? A: I don't know, and I don't care. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_connect takes 3.0 seconds
On Wed, 6 Jan 2010, Dmitri Girski wrote: On the other hand, if I use ip addresses this should not attract any possible issues with DNS, right? Not true. It is likely that the server program you are connecting to will perform a reverse DNS lookup to work out who the client is, for logging or authentication purposes. Matthew -- To err is human; to really louse things up requires root privileges. -- Alexander Pope, slightly paraphrased -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan
On Fri, 18 Dec 2009, Michael N. Mikhulya wrote: The problem here is that we are forced to fetch files in Bitmap Heap Scan. But actually there is no need for the whole files record. The necessary data is only files ids. The idea is to avoid fetching data from files table, and get the ids from index! (probably it is a little bit tricky, but it is a performance area...) Unfortunately, the index does not contain enough information to accomplish this. This is due to Postgres' advanced concurrency control system. Postgres needs to fetch the actual rows from the files table in order to check whether that row is visible in the current transaction, and a Bitmap Index Scan is the fastest way to do this. You can speed this up in Postgres 8.4 by having a RAID array and setting the effective_concurrency configuration to the number of spindles in the RAID array, or by having gobs of RAM and keeping everything in cache. Matthew -- A good programmer is one who looks both ways before crossing a one-way street. Considering the quality and quantity of one-way streets in Cambridge, it should be no surprise that there are so many good programmers there. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use
On Thu, 10 Dec 2009, Mark Stosberg wrote: What I'm noticing is that the while the FreeBSD server has 4 Gigs of memory, there are rarely every more than 2 in use-- the memory use graphs as being rather constant. My goal is to make good use of those 2 Gigs of memory to improve performance and reduce the CPU usage. I think you'll find that the RAM is already being used quite effectively as disc cache by the OS. It sounds like the server is actually set up pretty well. You may get slightly better performance by tweaking a thing here or there, but the server needs some OS disc cache to perform well. (We currently run 8.2, but are planning an upgrade to 8.4 soon). Highly recommended. [I tried to post this yesterday but didn't see it come through. This message is a second attempt.) The mailing list server will silently chuck any message whose subject starts with the word help, just in case you're asking for help about managing the mailing list. The default behaviour is not to inform you that it has done so. It is highly annoying - could a list admin please consider changing this? Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon.-- Tim Mullen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
On Tue, 8 Dec 2009, niraj patel wrote: Group (cost=509989.19..511518.30 rows=9 width=10) (actual time=1783.102..2362.587 rows=261 loops=1) - Sort (cost=509989.19..510753.74 rows=305821 width=10) (actual time=1783.097..2121.378 rows=272211 loops=1) Sort Key: topfamilyid - Bitmap Heap Scan on cmrules r (cost=14501.36..476896.34 rows=305821 width=10) (actual time=51.507..351.487 rows=272211 loops=1) Recheck Cond: (workspaceid = 18512::numeric) - Bitmap Index Scan on pk_ws_fea_fam_cmrules (cost=0.00..14424.90 rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1) Index Cond: (workspaceid = 18512::numeric) Total runtime: 2373.008 ms (8 rows) select count(*) from cmrules; Gives me 17 643 532 Rows Looks good from here. Think about what you're asking the database to do. It has to select 272211 rows out of a large table with 17643532 rows. That in itself could take a very long time. It is clear that in your EXPLAIN this data is already cached, otherwise it would have to perform nigh on 27 seeks over the discs, which would take (depending on the disc system) something on the order of twenty minutes. Those 272211 rows then have to be sorted, which takes a couple of seconds, which again is pretty good. The rows are then uniqued, which is really quick, before returning the results. It's hard to think how you would expect the database to do this any faster, really. Indexes: pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, topfamilyid, ruleenddate, gid) idx_cmrules btree (topfamilyid) idx_gid_ws_cmrules btree (gid, workspaceid) You may perhaps benefit from an index on just the workspaceid column, but the benefit may be minor. You may think of clustering the table on the index, but that will only be of benefit if the data is not in the cache. The statistics seem to be pretty accurate, predicting 305821 instead of 272211 rows. The database is not going to easily predict the number of unique results (9 instead of 261), but that doesn't affect the query plan much, so I wouldn't worry about it. I would consider upgrading to Postgres 8.4 if possible, as it does have some considerable performance improvements, especially for bitmap index scans if you are using a RAID array. I'd also try using SELECT DISTINCT rather than GROUP BY and seeing if that helps. Matthew -- Now the reason people powdered their faces back then was to change the values s and n in this equation here. - Computer science lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizing Bitmap Heap Scan.
On Tue, 8 Dec 2009, niraj patel wrote: Thanks very much for the analysis. It does takes 17 sec to execute when data is not in cache. It sounds like the table is already very much ordered by the workspaceid, otherwise this would have taken much longer. What I would like to ask can partitioning around workspaceid would help? Or any sort of selective index would help me. Depends on how many distinct values of workspaceid there are. I would suggest that given how well ordered your table is, and if you aren't doing too many writes, then there would be little benefit, and much hassle. Matthew -- Now, you would have thought these coefficients would be integers, given that we're working out integer results. Using a fraction would seem really stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to use complex numbers.-- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
On Tue, 1 Dec 2009, Jean-Michel Pouré wrote: PostgreSQL query analyzer needs to run a couple of times before it can rewrite and optimize the query. Make sure demand_id, id and join IDs carry indexes. Huh? At what point does the planner carry over previous plans and use them to further optimise the query? But perhaps the biggest factor here is calling a five table join a pretty simple query. Matthew -- Prolog doesn't have enough parentheses. -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query times change by orders of magnitude as DB ages
On Wed, 25 Nov 2009, Grzegorz Jaśkiewicz wrote: the out of order data layout is primary reason for index bloat. And that happens , and gets worse over time once data is more and more distributed. (random deletes, etc). That's not index bloat. Sure, having the table not in the same order as the index will slow down an index scan, but that's a completely different problem altogether. Index bloat is caused by exactly the same mechanism as table bloat. The index needs to have an entry for every row in the table that may be visible by anyone. As with the table, it is not possible to deterministically delete the rows as they become non-visible, so the index (and the table) will be left with dead entries on delete and update. The vacuum command performs garbage collection and marks these dead rows and index entries as free, so that some time in the future more data can be written to those places. Index bloat is when there is an excessive amount of dead space in an index. It can be prevented by (auto)vacuuming regularly, but can only be reversed by REINDEX (or of course deleting the index, or adding loads of new entries to fill up the dead space after vacuuming). Matthew -- for a in past present future; do for b in clients employers associates relatives neighbours pets; do echo The opinions here in no way reflect the opinions of my $a $b. done; done -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance degradation
On Tue, 24 Nov 2009, Denis Lussier wrote: Bouncing the app will roll back the transactions. Depends on the application. Some certainly use a shutdown hook to flush data out to a database cleanly. Obviously if you kill -9 it, then all bets are off. Matthew -- Software suppliers are trying to make their software packages more 'user-friendly' Their best approach, so far, has been to take all the old brochures, and stamp the words, 'user-friendly' on the cover. -- Bill Gates -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query times change by orders of magnitude as DB ages
On Wed, 25 Nov 2009, Richard Neill wrote: On Sun, 22 Nov 2009, Richard Neill wrote: Worse still, doing a cluster of most of the tables and vacuum full analyze Why are you doing a vacuum full? That command is not meant to be used except in the most unusual of circumstances, as it causes bloat to indexes. We'd left it too long, and the DB was reaching 90% of disk space. I didn't realise that vacuum full was ever actively bad, only sometimes unneeded. I do now - thanks for the tip. The problem is that vacuum full does a full compact of the table, but it has to update all the indexes as it goes. This makes it slow, and causes bloat to the indexes. There has been some discussion of removing the command or at least putting a big warning next to it. So, having managed to bloat the indexes in this way, what can I do to fix it? Will a regular vacuum do the job? In fact, cluster is exactly the command you are looking for. It will drop the indexes, do a complete table rewrite (in the correct order), and then recreate all the indexes again. In normal operation, a regular vacuum will keep the table under control, but if you actually want to shrink the database files in exceptional circumstances, then cluster is the tool for the job. Matthew -- Matthew: That's one of things about Cambridge - all the roads keep changing names as you walk along them, like Hills Road in particular. Sagar: Yes, Sidney Street is a bit like that too. Matthew: Sidney Street *is* Hills Road. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [GENERAL] Strange performance degradation
On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote: Anyway, how can I get rid those idle in transaction processes? Can I just kill -15 them or is there a less drastic way to do it? Are you crazy? Sure, if you want to destroy all of the changes made to the database in that transaction and thoroughly confuse the client application, you can send a TERM signal to a backend, but the consequences to your data are on your own head. Fix the application, don't tell Postgres to stop being a decent database. Matthew -- I would like to think that in this day and age people would know better than to open executables in an e-mail. I'd also like to be able to flap my arms and fly to the moon.-- Tim Mullen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance degradation
On Tue, 24 Nov 2009, Denis Lussier wrote: IMHO the client application is already confused and it's in Prod. Shouldn't he perhaps terminate/abort the IDLE connections in Prod and work on correcting the problem so it doesn't occur in Dev/Test?? The problem is, the connection isn't just IDLE - it is idle IN TRANSACTION. This means that there is quite possibly some data that has been modified in that transaction. If you kill the backend, then that will automatically roll back the transaction, and all of those changes would be lost. I agree that correcting the problem in dev/test is the priority, but I would be very cautious about killing transactions in production. You don't know what data is uncommitted. The safest thing to do may be to bounce the application, rather than Postgres. Matthew -- All of this sounds mildly turgid and messy and confusing... but what the heck. That's what programming's all about, really -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] RAID card recommendation
We're about to purchase a new server to store some of our old databases, and I was wondering if someone could advise me on a RAID card. We want to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 6 because there will be zero write traffic. The priority is stuffing as much storage into a small 2U rack as possible, with performance less important. We will be running Debian Linux. People have mentioned Areca as making good RAID controllers. We're looking at the Areca ARC-1220 PCI-Express x8 SATA II as a possibility. Does anyone have an opinion on whether it is a turkey or a star? Another possibility is a 3-ware card of some description. Thanks in advance, Matthew -- Now you see why I said that the first seven minutes of this section will have you looking for the nearest brick wall to beat your head against. This is why I do it at the end of the lecture - so I can run. -- Computer Science lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is the query not using the index for sorting?
On Sun, 22 Nov 2009, Jonathan Blitz wrote: I have a table with a number of columns. I perform Select * from table order by a,b There is an index on a,b which is clustered (as well as indexes on a and b alone). I have issued the cluster and anyalze commands. Did you analyse *after* creating the index and clustering, or before? Matthew -- [About NP-completeness] These are the problems that make efficient use of the Fairy Godmother.-- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query times change by orders of magnitude as DB ages
On Sun, 22 Nov 2009, Richard Neill wrote: Worse still, doing a cluster of most of the tables and vacuum full analyze Why are you doing a vacuum full? That command is not meant to be used except in the most unusual of circumstances, as it causes bloat to indexes. If you have run a cluster command, then running vacuum full will make the table and index layout worse, not better. Matthew -- Riker: Our memory pathways have become accustomed to your sensory input. Data: I understand - I'm fond of you too, Commander. And you too Counsellor -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On Thu, 19 Nov 2009, Greg Smith wrote: This is why turning the cache off can tank performance so badly--you're going to be writing a whole 128K block no matter what if it's force to disk without caching, even if it's just to write a 8K page to it. Theoretically, this does not need to be the case. Now, I don't know what the Intel drives actually do, but remember that for flash, it is the *erase* cycle that has to be done in large blocks. Writing itself can be done in small blocks, to previously erased sites. The technology for combining small writes into sequential writes has been around for 17 years or so in http://portal.acm.org/citation.cfm?id=146943dl= so there really isn't any excuse for modern flash drives not giving really fast small writes. Matthew -- for a in past present future; do for b in clients employers associates relatives neighbours pets; do echo The opinions here in no way reflect the opinions of my $a $b. done; done -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance degradation
On Fri, 20 Nov 2009, Lorenzo Allegrucci wrote: performance is degrading... In normal conditions the postgres process uses about 3% of cpu time but when is in degraded conditions it can use up to 25% of cpu time. You don't really give enough information to determine what is going on here. This could be one of two situations: 1. You have a constant incoming stream of short-lived requests at a constant rate, and Postgres is taking eight times as much CPU to service it as normal. You're looking at CPU usage in aggregate over long periods of time. In this case, we should look at long running transactions and other slowdown possibilities. 2. You are running a complex query, and you look at top and see that Postgres uses eight times as much CPU as when it has been freshly started. In this case, the performance degradation could actually be that the data is more in cache, and postgres is able to process it eight times *faster*. Restarting Postgres kills the cache and puts you back at square one. Which of these is it? Matthew -- Reality is that which, when you stop believing in it, doesn't go away. -- Philip K. Dick -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance