Re: [PERFORM] Slow query with 3 table joins

2017-04-26 Thread Matthew Bellew
This looks like the same optimizer problem that occasionally plagues our
customers.  Whenever the estimated rows of a join==1, but the actual rows
is higher, the optimizer may choose very poor plans.  I made some attempts
to fix.  The very simple fix is to never estimate 1 for a join result.
Even using 2 works remarkably well as a defense against this problem.


https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a

I also made a much more correct but complicated patch to track both
uniqueness and selectivity thought the optimizer, but I didn't quite push
that over the finish line (I made a mistake in the hash join code, and got
distracted by my day job before finishing it).

  https://github.com/labkey-matthewb/postgres/commits/struct_selectivity

The second path is certainly better approach, but needs someone to pick up
the mission.

Matt

On Wed, Apr 26, 2017 at 8:00 AM, Gerardo Herzig  wrote:

> Some other approaches you could try:
>
> 1) What about an hashed index? You could make
> CREATE INDEX ON FIELD (unit_id, hashtext(field_name))
>
> and changing your query accordingly
>
> "where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') "
>
> 2) Partitioning (not native yet, but can be simulated through
> inheritance), like in
> https://www.postgresql.org/docs/current/static/ddl-partitioning.html
> This could work well if you have a sort of limited different values in
> FIELD.FIELD_NAME
>
> Gerardo
>
> - Mensaje original -
> > De: "Alessandro Ferrucci" 
> > Para: pgsql-performance@postgresql.org
> > Enviados: Miércoles, 26 de Abril 2017 0:19:37
> > Asunto: Re: [PERFORM] Slow query with 3 table joins
> >
> >
> >
> > After about 40 inutes the slow query finally finished and the result
> > of the EXPLAIN plan can be found here:
> >
> >
> > https://explain.depesz.com/s/BX22
> >
> >
> > Thanks,
> > Alessandro Ferrucci
> >
> >
> > On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
> > alessandroferru...@gmail.com > wrote:
> >
> >
> >
> >
> > Hello - I am migrating a current system to PostgreSQL and I am having
> > an issue with a relatively straightforward query being extremely
> > slow.
> >
> >
> > The following are the definitions of the tables:
> >
> >
> > CREATE TABLE popt_2017.unit
> > (
> > id serial NOT NULL,
> > unit_id text,
> > batch_id text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT unit_pkey PRIMARY KEY (id)
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.field
> > (
> > id serial NOT NULL,
> > unit_id integer,
> > subunit_data_id integer,
> > field_name character varying(50),
> > page_id character varying(20),
> > page_type character varying(20),
> > batch_id character varying(20),
> > file_name character varying(20),
> > data_concept integer,
> > "GROUP" integer,
> > omr_group integer,
> > pres integer,
> > reg_data text,
> > ocr_conf text,
> > ocr_dict text,
> > ocr_phon text,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT field_pkey PRIMARY KEY (id),
> > CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
> > REFERENCES popt_2017.subunit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
> > REFERENCES popt_2017.unit (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > CREATE TABLE popt_2017.answer
> > (
> > id serial NOT NULL,
> > field_id integer,
> > ans_status integer,
> > ans text,
> > luggage text,
> > arec text,
> > kfi_partition integer,
> > final boolean,
> > length integer,
> > create_date timestamp without time zone DEFAULT now(),
> > update_date timestamp without time zone,
> > CONSTRAINT answer_pkey PRIMARY KEY (id),
> > CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
> > REFERENCES popt_2017.field (id) MATCH FULL
> > ON UPDATE NO ACTION ON DELETE NO ACTION,
> > CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
> > REFERENCES popt_2017.field (id) MATCH SIMPLE
> > ON UPDATE NO ACTION ON DELETE NO ACTION
> > )
> > WITH (
> > OIDS=FALSE
> > );
> >
> >
> > Below are the index definitions for those tables:
> >
> >
> > UNIT:
> > CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
> > CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);
> >
> >
> > FIELD:
> > CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
> > CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
> > CREATE INDEX field_subunit_id_idx ON field USING btree
> > (subunit_data_id)
> > CREATE INDEX field_field_name_idx ON field USING btree (field_name)
> >
> >
> > ANSWER:
> > CREATE UNIQUE INDEX answer_pkey ON answer 

[PERFORM] Performance difference between Slon master and slave

2015-12-14 Thread Matthew Lunnon

Hello all, I hope someone can help me with this.

Postgres 9.4.4
Slon 2.2.4
Linux

I am using slony-i to replicate a production database which is in the 
order of 70GB. I have a reasonably complex select query that runs in 40 
seconds on the master but takes in the region of 30-40 minutes on the 
slave. The postgres configurations are identical and the machines are a 
similar specifications (12 core hyper threaded HP server and the slave 
has slightly less RAM: 132GB vs 148GB) The server running the slave 
database has a higher load than the one running the master though the 
load average on the slave machine was low (1-2) when running the test 
and the postgres process on the slave machine runs at 100% of a CPU with 
very little iowait on the server.


Inspecting the execution plan shows that there are some differences, for 
example, the slave is using a HashAggregate when the master is simply 
grouping. There also seems to be a difference with the ordering of the 
sub plans. Armed with this knowledge I have set enable_hashagg to off 
and run the query again and it now takes 53 seconds on the slave which 
is a more acceptable difference and the execution plans now look very 
similar (one difference being that there is another HashAggregate in the 
master which is now missing on the slave and may account for the 13 
seconds). I have isolated a much simpler query which I have detailed 
below with their execution plans which shows the difference on line 4. I 
would rather not disable hash aggregation on the slave as this might 
have other consequences so this raises a number of questions. Firstly Is 
there anything that I can do to stop this feature? Why is the slave 
behaving differently to the master?


Thanks in advance for any help.

Cheers
Matthew

explain
with my_view_booking_pax_breakdown as (
SELECT bev.booking_id,
( SELECT count(*) AS count
   FROM passenger_version
  WHERE passenger_version.current_version = 'T'::bpchar AND 
passenger_version.deleted = 'F'::bpchar AND 
passenger_version.indicative_pax_type = 'A'::bpchar AND 
passenger_version.booking_id = bev.booking_id) AS adult_count,

( SELECT count(*) AS count
   FROM passenger_version
  WHERE passenger_version.current_version = 'T'::bpchar AND 
passenger_version.deleted = 'F'::bpchar AND 
passenger_version.indicative_pax_type = 'C'::bpchar AND 
passenger_version.booking_id = bev.booking_id) AS child_count,

( SELECT count(*) AS count
   FROM passenger_version
  WHERE passenger_version.current_version = 'T'::bpchar AND 
passenger_version.deleted = 'F'::bpchar AND 
passenger_version.indicative_pax_type = 'I'::bpchar AND 
passenger_version.booking_id = bev.booking_id) AS infant_count

   FROM booking_expanded_version bev
  GROUP BY bev.booking_id
)
select * from "my_view_booking_pax_breakdown" "view_booking_pax_breakdown"
INNER JOIN "booking"."booking_expanded_version" 
"booking_expanded_version" ON 
"view_booking_pax_breakdown"."booking_id"="booking_expanded_version"."booking_id" 



Master

"Merge Join  (cost=5569138.32..6158794.12 rows=2461265 width=1375)"
"  Merge Cond: (booking_expanded_version.booking_id = 
view_booking_pax_breakdown.booking_id)"

"  CTE my_view_booking_pax_breakdown"
*"->  Group  (cost=0.43..5545692.19 rows=215891 width=4)"*
"  Group Key: bev.booking_id"
"  ->  Index Only Scan using 
booking_expanded_version_booking_idx on booking_expanded_version bev  
(cost=0.43..64607.40 rows=2461265 width=4)"

"  SubPlan 1"
"->  Aggregate  (cost=8.57..8.58 rows=1 width=0)"
"  ->  Index Scan using passenger_version_idx_4 on 
passenger_version  (cost=0.43..8.55 rows=5 width=0)"

"Index Cond: (booking_id = bev.booking_id)"
"  SubPlan 2"
"->  Aggregate  (cost=8.45..8.46 rows=1 width=0)"
"  ->  Index Scan using passenger_version_idx_3 on 
passenger_version passenger_version_1  (cost=0.42..8.45 rows=1 width=0)"

"Index Cond: (booking_id = bev.booking_id)"
"  SubPlan 3"
"->  Aggregate  (cost=8.31..8.32 rows=1 width=0)"
"  ->  Index Scan using passenger_version_idx_2 on 
passenger_version passenger_version_2  (cost=0.29..8.31 rows=1 width=0)"

"Index Cond: (booking_id = bev.booking_id)"
"  ->  Index Scan using booking_expanded_version_booking_idx on 
booking_expanded_version  (cost=0.43..546584.09 rows=2461265 width=1347)"

"  ->  Sort  (cost=23445.70..23985.43 rows=215891 width=28)"
"Sort Key: view_booking_pax_breakdown.booking_id"
"

[PERFORM] Query optimizer plans with very small selectivity estimates

2015-10-29 Thread Matthew Bellew
This related to a post in the general bugs forum, but I found this forum,
and
 this seems more appropriate.  This is my second attempt to post, I believe
the first attempt last week did not work, apologies if I'm duplicating.

http://comments.gmane.org/gmane.comp.db.postgresql.bugs/39011

I made have several users encounter performance problems, which all
seem to come down to this problem: multiplying selectivity estimates can
cause tuple estimates to grow very small very quickly, once the estimator
gets to 1 row, the planner may choose plans that are very good ONLY WHEN
there is exactly 1 row (maybe even O(N^large)).  Unfortunately, these may
be the worst plans if the estimate is even slightly off (even just
returning
2 or 3 rows versus 1).

Using the patch below, I discovered that clamping relation tuple estimates
to a number as small as 2 seemed to avoid all the catastrophic query
plans.

In the scenarios I'm seeing, I have several examples of queries
that take >1m to run that should run in <1s. The estimate of 1 row
(versus thousands actual) leads the planner to tee up several nest loop
joins
which causes thousands of table scans.

I have been working on a more complete which tracks uniqueness along
with selectivity so that optimizer can benefit from knowing when a
relation must have 1 (or fewer) tuples, while clamping all other relations
to 2 rather than 1.

typedef struct
{
double selectivity;
boolean unique;
} Selectivity;

I am interested in hearing discussion about this problem, and if the
community
is open to a patch if I continue pursuing the development.

Matt



FIRST ARTIFACT

plan with expensive (80s join) and join estimate of 1
note the first Nested Loop join and 81s join
(I gave up trying to post the full explain, because of the 80 char limit)

"Sort  (cost=7000.04..7000.04 rows=1 width=49)
(actual time=81739.426..81740.023 rows=5091 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 948kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.001..0.001 rows=5 loops=1)"
"  ->  Nested Loop  (cost=1.36..6999.97 rows=1 width=49)
(actual time=0.059..81725.475 rows=5091 loops=1)"
"Planning time: 1.912 ms"
"Execution time: 81740.328 ms"


SECOND ARTIFACT

force join row estimate to be minimun of 2
query completes very quickly

"Sort  (cost=7000.06..7000.06 rows=2 width=49)
(actual time=84.610..85.192 rows=5142 loops=1)"
"  Sort Key: c.ten DESC"
"  Sort Method: quicksort  Memory: 956kB"
"  CTE cte"
"->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=4)
(actual time=0.002..0.003 rows=5 loops=1)"
"  ->  Hash Join  (cost=2518.99..6999.98 rows=2 width=49)
(actual time=17.629..82.886 rows=5142 loops=1)"

"Planning time: 2.982 ms"
"Execution time: 85.514 ms"


THIRD ARTIFACT

patch I used to make experimenting easier w/o recompiling


index 1b61fd9..444703c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -68,6 +68,12 @@
  *-
  */

+
+
+/* These parameters are set by GUC */
+int join_row_estimate_clamp=1;
+
+
 #include "postgres.h"

 #ifdef _MSC_VER
@@ -175,6 +181,17 @@ clamp_row_est(double nrows)
 }


+double
+clamp_join_row_est(double nrows)
+{
+ nrows = clamp_row_est(nrows);
+ if (nrows >= (double)join_row_estimate_clamp)
+ return nrows;
+return (double)join_row_estimate_clamp;
+}
+
+
+
 /*
  * cost_seqscan
  *  Determines and returns the cost of scanning a relation sequentially.
@@ -3886,7 +3903,7 @@ calc_joinrel_size_estimate(PlannerInfo *root,
  break;
  }

- return clamp_row_est(nrows);
+ return clamp_join_row_est(nrows);
 }

 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 71090f2..fabb8ac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2664,6 +2664,16 @@ static struct config_int ConfigureNamesInt[] =
  NULL, NULL, NULL
  },

+ {
+ {"join_row_estimate_clamp", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Set the minimum estimated size of a join result."),
+NULL
+ },
+ _row_estimate_clamp,
+ 1, 1, 1,
+ NULL, NULL, NULL
+ },
+
  /* End-of-list marker */
  {
  {NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 25a7303..0161c4b 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -67,8 +67,10 @@ extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern int constraint_exclusion;
+extern int join_row_estimate_clamp;

 extern double clamp_row_est(double nrows);
+extern double clamp_join_row_est(double nrows);
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
  double index_pages, PlannerInfo *root);
 extern void cost_seqscan(Path *path, PlannerInfo *root, RelOptInfo
*baserel,


Re: [PERFORM] Stalls on PGSemaphoreLock

2014-04-22 Thread Matthew Spilich
Hi all -  I am a little delayed in reporting back on this issue, but it was 
indeed the hugepage defrag setting that was the cause of my issue.   One item 
that we noticed as we were testing this issue that I wanted to report back to 
the forum is that these settings 

cat  /sys/kernel/mm/transparent_hugepage/defrag 
always [never]

cat  /sys/kernel/mm/transparent_hugepage/enabled 
always [never]

Were not sicky on reboot for my version of CentOS, which probably explains why 
I thought this was disabled already only to have it crop back up.   Anyway, I 
wanted to report back these findings to close the loop on this and to thank the 
community again for their support.

Best,
Matt


From: Pavy Philippe [philippe.p...@worldline.com]
Sent: Tuesday, March 25, 2014 4:10 PM
To: Matthew Spilich; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Stalls on PGSemaphoreLock

Here, we were the transparent hugepage always actif:
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
[always] never

We changed to:
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
always [never]



For the semaphore, our initial configuration was:
cat /proc/sys/kernel/sem
250 32000 32 128

And we changed to:
cat /proc/sys/kernel/sem
5010641280  5010128




-Message d'origine-
De : pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] De la part de Matthew Spilich
Envoyé : mardi 25 mars 2014 19:38
À : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Stalls on PGSemaphoreLock

Thanks all:

Ray:  Thanks, we started to look at the hardware/firmware, but didn't get to 
the the level of detail or running sar.   I will probably collect more detail 
in this area if I continue to see issues.

Pavy - I hope that you are right that the hugepage setting is the issue.   I 
was under the impression that I had it disabled already because this has been 
an known issue for us in the past, but it turns out this was not the case for 
this server in question.   I have disabled it at this time, but it will take a 
few days of running without issue before I am comfortable declaring that this 
is the solution.   Can you elaborate on the change you mention to upgrade the 
semaphore configuration?   I think this is not something I have looked at 
before.

Ashutosh - Thanks for the reply, I started to do that at first.   I turned on 
log_statement=all for a few hours and I generated a few GB of log file, and I 
didn't want to leave it running in that state for too long because the issue 
happens every few days, and not on any regular schedule, so I reverted that 
after collecting a few GB of detail in the pg log.   What I'm doing now to 
sample every few seconds is I think giving me a decent picture of what is going 
on with the incident occurs and is a level of data collection that I am more 
comfortable will not impact operations.  I am also logging at the level of 
'mod' and all duration  500ms.   I don't see that large write operations are a 
contributing factor leading up to these incidents.

I'm hoping that disabling the hugepage setting will be the solution to this.  
I'll check back in a day or two with feedback.

Thanks,
Matt



From: Pavy Philippe [philippe.p...@worldline.com]
Sent: Tuesday, March 25, 2014 1:45 PM
To: Ray Stell; Matthew Spilich
Cc: pgsql-performance@postgresql.org
Subject: RE : [PERFORM] Stalls on PGSemaphoreLock

Hello

Recently I have a similar problem. The first symptom was a freeze of the 
connection and 100% of CPU SYS during 2 et 10 minutes, 1 or 2 times per day.
Connection impossible, slow query. The strace on one backend show a very long 
system call on semop().
We have a node with 48 cores dans 128 Go of memory.

We have disable the hugepage and upgrade the semaphore configuration, and since 
that time, we no longer have any problem of freeze on our instance.

Can you check the hugepage and semaphore configuration on our node ?

I am interested in this case, so do not hesitate to let me make a comeback. 
Thanks.

excuse me for my bad english !!!


De : pgsql-performance-ow...@postgresql.org 
[pgsql-performance-ow...@postgresql.org] de la part de Ray Stell 
[ste...@vt.edu] Date d'envoi : mardi 25 mars 2014 18:17 À : Matthew Spilich Cc 
: pgsql-performance@postgresql.org Objet : Re: [PERFORM] Stalls on 
PGSemaphoreLock

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is 
running a low utilization most of the time, but once every day or two, it will 
appear to slow down to the point where queries back up and clients are unable 
to connect.  Once this event occurs, there are lots of concurrent queries, I 
see slow queries appear in the logs, but there doesn't appear to be anything 
abnormal that I have been able

[PERFORM] semaphore waits and performance stall

2014-03-26 Thread Matthew Spilich
Hi everyone!


I've been working on a puzzling issue for a few days am am hoping that someone 
has seen something similar or can help.  There have been some odd behaviors on 
one of my production facing postgres servers.


version info from postgres: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, 
compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit


The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is 
running a low utilization most of the time, but once every day or two, it will 
appear to slow down to the point where queries back up and clients are unable 
to connect.  Once this event occurs, there are lots of concurrent queries, I 
see slow queries appear in the logs, but there doesn't appear to be anything 
abnormal that I have been able to see that causes this behavior.  The event 
will occur just long enough for monitoring to alarm.   The team will respond to 
alerts to take a look, but within a minute or three at most, load returns back 
to normal levels and all running queries complete in expected times.


At the time of the event, we see a spike in system CPU and load average, but we 
do not see a corresponding spike in disk reads or writes which would indicate 
IO load.   Initial troubleshooting to monitor active processes led us to see a 
flurry of activity in ps waiting on semtimedop.   Our efforts internally to 
diagnose this problem are to sample pg_locks and pg_stat_activity every 5s plus 
running a script to look for at least one postgres process waiting on a 
semaphore, and if it finds one, it gets a stack trace of every running postgres 
processes with GDB.  It also uses strace on 5 processes to find out which 
semaphore they're waiting on.


What we were catching in the following stack trace seems to be representative 
of where things are waiting when we see an event - here are two examples that 
are representative:


- 47245 -
0x0037392eb197 in semop () from /lib64/libc.so.6
#0  0x0037392eb197 in semop () from /lib64/libc.so.6
#1  0x005e0c87 in PGSemaphoreLock ()
#2  0x0061e3af in LWLockAcquire ()
#3  0x0060aa0f in ReadBuffer_common ()
#4  0x0060b2e4 in ReadBufferExtended ()
#5  0x0047708d in _bt_relandgetbuf ()
#6  0x0047aac4 in _bt_search ()
#7  0x0047af8d in _bt_first ()
#8  0x00479704 in btgetbitmap ()
#9  0x006e7e00 in FunctionCall2Coll ()
#10 0x00473120 in index_getbitmap ()
#11 0x005726b8 in MultiExecBitmapIndexScan ()
#12 0x0057214d in BitmapHeapNext ()
#13 0x0056b18e in ExecScan ()
#14 0x00563ed8 in ExecProcNode ()
#15 0x00562d72 in standard_ExecutorRun ()
#16 0x0062ce67 in PortalRunSelect ()
#17 0x0062e128 in PortalRun ()
#18 0x0062bb66 in PostgresMain ()
#19 0x005ecd01 in ServerLoop ()
#20 0x005ef401 in PostmasterMain ()
#21 0x00590ff8 in main ()

- 47257 -
0x0037392eb197 in semop () from /lib64/libc.so.6
#0  0x0037392eb197 in semop () from /lib64/libc.so.6
#1  0x005e0c87 in PGSemaphoreLock ()
#2  0x0061e3af in LWLockAcquire ()
#3  0x0060aa0f in ReadBuffer_common ()
#4  0x0060b2e4 in ReadBufferExtended ()
#5  0x0047708d in _bt_relandgetbuf ()
#6  0x0047aac4 in _bt_search ()
#7  0x0047af8d in _bt_first ()
#8  0x004797d1 in btgettuple ()
#9  0x006e7e00 in FunctionCall2Coll ()
#10 0x0047339d in index_getnext ()
#11 0x00575ed6 in IndexNext ()
#12 0x0056b18e in ExecScan ()
#13 0x00563ee8 in ExecProcNode ()
#14 0x00562d72 in standard_ExecutorRun ()
#15 0x0062ce67 in PortalRunSelect ()
#16 0x0062e128 in PortalRun ()
#17 0x0062bb66 in PostgresMain ()
#18 0x005ecd01 in ServerLoop ()
#19 0x005ef401 in PostmasterMain ()
#20 0x00590ff8 in main ()


Has any on the forum seen something similar?   Any suggestions on what to look 
at next?If it is helpful to describe the server hardware, it's got 2 
E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 
local storage (15K 300 GB drives).   The workload is predominantly read and the 
queries are mostly fairly simple selects from a single large table generally 
specifying the primary key as part of the where clause along with a few other 
filters.


Thanks,

Matt


[PERFORM] Stalls on PGSemaphoreLock

2014-03-25 Thread Matthew Spilich
Hi everyone!


I've been working on a puzzling issue for a few days am am hoping that someone 
has seen something similar or can help.  There have been some odd behaviors on 
one of my production facing postgres servers.


version info from postgres: PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, 
compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit


The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is 
running a low utilization most of the time, but once every day or two, it will 
appear to slow down to the point where queries back up and clients are unable 
to connect.  Once this event occurs, there are lots of concurrent queries, I 
see slow queries appear in the logs, but there doesn't appear to be anything 
abnormal that I have been able to see that causes this behavior.  The event 
will occur just long enough for monitoring to alarm.   We will respond to 
alerts to take a look, but within a minute or three at most, load returns back 
to normal levels and all running queries complete in expected times.


At the time of the event, we see a spike in system CPU and load average, but we 
do not see a corresponding spike in disk reads or writes which would indicate 
IO load.   Initial troubleshooting to monitor active processes led us to see a 
flurry of activity in ps waiting on semtimedop.   Our efforts internally to 
diagnose this problem are to sample pg_locks and pg_stat_activity every 5s plus 
running a script to look for at least one postgres process waiting on a 
semaphore, and if it finds one, it gets a stack trace of every running postgres 
processes with GDB.  It also uses strace on 5 processes to find out which 
semaphore they're waiting on.


What we were catching in the following stack trace seems to be representative 
of where things are waiting when we see an event - here are two examples that 
are representative - lots of threads will appear to be in this state:


- 47245 -
0x0037392eb197 in semop () from /lib64/libc.so.6
#0  0x0037392eb197 in semop () from /lib64/libc.so.6
#1  0x005e0c87 in PGSemaphoreLock ()
#2  0x0061e3af in LWLockAcquire ()
#3  0x0060aa0f in ReadBuffer_common ()
#4  0x0060b2e4 in ReadBufferExtended ()
#5  0x0047708d in _bt_relandgetbuf ()
#6  0x0047aac4 in _bt_search ()
#7  0x0047af8d in _bt_first ()
#8  0x00479704 in btgetbitmap ()
#9  0x006e7e00 in FunctionCall2Coll ()
#10 0x00473120 in index_getbitmap ()
#11 0x005726b8 in MultiExecBitmapIndexScan ()
#12 0x0057214d in BitmapHeapNext ()
#13 0x0056b18e in ExecScan ()
#14 0x00563ed8 in ExecProcNode ()
#15 0x00562d72 in standard_ExecutorRun ()
#16 0x0062ce67 in PortalRunSelect ()
#17 0x0062e128 in PortalRun ()
#18 0x0062bb66 in PostgresMain ()
#19 0x005ecd01 in ServerLoop ()
#20 0x005ef401 in PostmasterMain ()
#21 0x00590ff8 in main ()

- 47257 -
0x0037392eb197 in semop () from /lib64/libc.so.6
#0  0x0037392eb197 in semop () from /lib64/libc.so.6
#1  0x005e0c87 in PGSemaphoreLock ()
#2  0x0061e3af in LWLockAcquire ()
#3  0x0060aa0f in ReadBuffer_common ()
#4  0x0060b2e4 in ReadBufferExtended ()
#5  0x0047708d in _bt_relandgetbuf ()
#6  0x0047aac4 in _bt_search ()
#7  0x0047af8d in _bt_first ()
#8  0x004797d1 in btgettuple ()
#9  0x006e7e00 in FunctionCall2Coll ()
#10 0x0047339d in index_getnext ()
#11 0x00575ed6 in IndexNext ()
#12 0x0056b18e in ExecScan ()
#13 0x00563ee8 in ExecProcNode ()
#14 0x00562d72 in standard_ExecutorRun ()
#15 0x0062ce67 in PortalRunSelect ()
#16 0x0062e128 in PortalRun ()
#17 0x0062bb66 in PostgresMain ()
#18 0x005ecd01 in ServerLoop ()
#19 0x005ef401 in PostmasterMain ()
#20 0x00590ff8 in main ()


Has any on the forum seen something similar?   Any suggestions on what to look 
at next?If it is helpful to describe the server hardware, it's got 2 
E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 
local storage (15K 300 GB drives).   The workload is predominantly read and the 
queries are mostly fairly simple selects from a single large table generally 
specifying the primary key as part of the where clause along with a few other 
filters.


Thanks,

Matt


Re: [PERFORM] Stalls on PGSemaphoreLock

2014-03-25 Thread Matthew Spilich
Thanks all: 

Ray:  Thanks, we started to look at the hardware/firmware, but didn't get to 
the the level of detail or running sar.   I will probably collect more detail 
in this area if I continue to see issues.

Pavy - I hope that you are right that the hugepage setting is the issue.   I 
was under the impression that I had it disabled already because this has been 
an known issue for us in the past, but it turns out this was not the case for 
this server in question.   I have disabled it at this time, but it will take a 
few days of running without issue before I am comfortable declaring that this 
is the solution.   Can you elaborate on the change you mention to upgrade the 
semaphore configuration?   I think this is not something I have looked at 
before.

Ashutosh - Thanks for the reply, I started to do that at first.   I turned on 
log_statement=all for a few hours and I generated a few GB of log file, and I 
didn't want to leave it running in that state for too long because the issue 
happens every few days, and not on any regular schedule, so I reverted that 
after collecting a few GB of detail in the pg log.   What I'm doing now to 
sample every few seconds is I think giving me a decent picture of what is going 
on with the incident occurs and is a level of data collection that I am more 
comfortable will not impact operations.  I am also logging at the level of 
'mod' and all duration  500ms.   I don't see that large write operations are a 
contributing factor leading up to these incidents.

I'm hoping that disabling the hugepage setting will be the solution to this.  
I'll check back in a day or two with feedback.

Thanks,
Matt



From: Pavy Philippe [philippe.p...@worldline.com]
Sent: Tuesday, March 25, 2014 1:45 PM
To: Ray Stell; Matthew Spilich
Cc: pgsql-performance@postgresql.org
Subject: RE : [PERFORM] Stalls on PGSemaphoreLock

Hello

Recently I have a similar problem. The first symptom was a freeze of the 
connection and 100% of CPU SYS during 2 et 10 minutes, 1 or 2 times per day.
Connection impossible, slow query. The strace on one backend show a very long 
system call on semop().
We have a node with 48 cores dans 128 Go of memory.

We have disable the hugepage and upgrade the semaphore configuration, and since 
that time, we no longer have any problem of freeze on our instance.

Can you check the hugepage and semaphore configuration on our node ?

I am interested in this case, so do not hesitate to let me make a comeback. 
Thanks.

excuse me for my bad english !!!


De : pgsql-performance-ow...@postgresql.org 
[pgsql-performance-ow...@postgresql.org] de la part de Ray Stell [ste...@vt.edu]
Date d'envoi : mardi 25 mars 2014 18:17
À : Matthew Spilich
Cc : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Stalls on PGSemaphoreLock

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:

The symptom:   The database machine (running postgres 9.1.9 on CentOS 6.4) is 
running a low utilization most of the time, but once every day or two, it will 
appear to slow down to the point where queries back up and clients are unable 
to connect.  Once this event occurs, there are lots of concurrent queries, I 
see slow queries appear in the logs, but there doesn't appear to be anything 
abnormal that I have been able to see that causes this behavior.
...
Has any on the forum seen something similar?   Any suggestions on what to look 
at next?If it is helpful to describe the server hardware, it's got 2 
E5-2670 cpu and 256 GB of ram, and the database is hosted on 1.6TB raid 10 
local storage (15K 300 GB drives).


I could be way off here, but years ago I experienced something like this (in 
oracle land) and after some stressful chasing, the marginal failure of the raid 
controller revealed itself.  Same kind of event, steady traffic and then some 
i/o would not complete and normal ops would stack up.  Anyway, what you report 
reminded me of that event.  The E5 is a few years old, I wonder if the raid 
controller firmware needs a patch?  I suppose a marginal power supply might 
cause a similar hang.  Anyway, marginal failures are very painful.  Have you 
checked sar or OS logging at event time?


Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential

Re: [PERFORM] DELETE vs TRUNCATE explanation

2012-07-11 Thread Matthew Woodcraft
Tom Lane wrote:
 (3) The performance of the truncation itself should not be viewed in
 isolation; subsequent behavior also needs to be considered.  An example
 of possible degradation is that index bloat would no longer be
 guaranteed to be cleaned up over a series of repeated truncations.
 (You might argue that if the table is small then the indexes couldn't
 be very bloated, but I don't think that holds up over a long series.)

 IOW, I think it's fine as-is.  I'd certainly wish to see many more
 than one complainant before we expend effort in this area.

I think a documentation change would be worthwhile.

At the moment the TRUNCATE page says, with no caveats, that it is faster than
unqualified DELETE.

It surprised me to find that this wasn't true (with 7.2, again with small
tables in a testsuite), and evidently it's still surprising people today.

-M-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-11 Thread Matthew Wakeling
 tracker_objectid on tracker b
 (cost=0.00..1096747.23 rows=3616658 width=17)
 (actual time=0.084..5781.844 rows=3790872 loops=1)
   Index Cond: (objectid  120)
 Total runtime: 55756.482 ms
(14 rows)

On Tue, 10 Aug 2010, Jonathan Blitz wrote:

Select groupfield,value
From tbl x1
Where number = (select max(number) from tbl x2 where x2.groupfield=
x1.groupfield)


This one effectively forces a nested loop join:

explain analyse select objectid, fieldname, sourcename from tracker as a 
where version = (select max(version) from tracker as b where a.objectid = 
b.objectid and a.fieldname = b.fieldname) and a.objectid  120;


   QUERY PLAN
-
 Index Scan using tracker_objectid on tracker a
   (cost=0.00..58443381.75 rows=18083 width=34)
   (actual time=6.482..59803.225 rows=1802376 loops=1)
   Index Cond: (objectid  120)
   Filter: (version = (SubPlan 2))
   SubPlan 2
 -  Result  (cost=15.89..15.90 rows=1 width=0)
 (actual time=0.011..0.012 rows=1 loops=3790872)
   InitPlan 1 (returns $2)
 -  Limit  (cost=0.00..15.89 rows=1 width=4)
(actual time=0.007..0.008 rows=1 loops=3790872)
   -  Index Scan Backward using tracker_all on tracker b
(cost=0.00..31.78 rows=2 width=4)
(actual time=0.005..0.005 rows=1 loops=3790872)
 Index Cond: (($0 = objectid) AND ($1 = fieldname))
 Filter: (version IS NOT NULL)
 Total runtime: 61649.116 ms
(11 rows)

On Tue, 10 Aug 2010, Jonathan Blitz wrote:

Select groupfield,value
From tbl x1
Where (groupfield,number) in (select groupfield,max(number) from tbl group
by groupfield)


This is another join.

explain analyse select objectid, fieldname, sourcename from tracker where 
(objectid, fieldname, version) in (select objectid, fieldname, 
max(version) from tracker group by objectid, fieldname);


I terminated this query after about an hour. Here is the EXPLAIN:

  QUERY PLAN

 Hash Join  (cost=55310973.80..72060974.32 rows=55323 width=30)
   Hash Cond: ((public.tracker.objectid = public.tracker.objectid)
   AND (public.tracker.fieldname = public.tracker.fieldname)
   AND (public.tracker.version = (max(public.tracker.version
   -  Seq Scan on tracker  (cost=0.00..5310600.80 rows=293972480 width=34)
   -  Hash  (cost=54566855.96..54566855.96 rows=29397248 width=40)
 -  GroupAggregate
 (cost=50965693.08..54272883.48 rows=29397248 width=17)
   -  Sort
   (cost=50965693.08..51700624.28 rows=293972480 width=17)
 Sort Key: public.tracker.objectid, public.tracker.fieldname
 -  Seq Scan on tracker
 (cost=0.00..5310600.80 rows=293972480 width=17)
(8 rows)

Matthew

--
I quite understand I'm doing algebra on the blackboard and the usual response
is to throw objects...  If you're going to freak out... wait until party time
and invite me along -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling


I'm trying to eke a little bit more performance out of an application, and 
I was wondering if there was a better way to do the following:


I am trying to retrieve, for many sets of rows grouped on a couple of 
fields, the value of an ungrouped field where the row has the highest 
value in another ungrouped field. For instance, I have the following table 
setup:


group  | whatever type
value  | whatever type
number | int
Index: group

I then have rows like this:

group | value | number
-
Foo   | foo   | 1
Foo   | turnips   | 2
Bar   | albatross | 3
Bar   | monkey| 4

I want to receive results like this:

group | value
---
Foo   | turnips
Bar   | monkey

Currently, I do this in my application by ordering by the number and only 
using the last value. I imagine that this is something that can be done in 
the new Postgres 9, with a sorted group by - something like this:


SELECT group, LAST(value, ORDER BY number) FROM table GROUP BY group

Is this something that is already built in, or would I have to write my 
own LAST aggregate function?


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling

On Tue, 10 Aug 2010, Thomas Kellerer wrote:

No. It's built in (8.4) and it's called Windowing functions:
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresql.org/docs/8.4/static/functions-window.html

SELECT group, last_value(value) over(ORDER BY number)
FROM table


I may be mistaken, but as I understand it, a windowing function doesn't 
reduce the number of rows in the results?


Matthew

--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats. -- Howard Aiken

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-06 Thread Matthew Wakeling

On Thu, 5 Aug 2010, Scott Marlowe wrote:

RAID6 is basically RAID5 with a hot spare already built into the
array.


On Fri, 6 Aug 2010, Pierre C wrote:

As others said, RAID6 is RAID5 + a hot spare.


No. RAID6 is NOT RAID5 plus a hot spare.

RAID5 uses a single parity datum (XOR) to ensure protection against data 
loss if one drive fails.


RAID6 uses two different sets of parity (Reed-Solomon) to ensure 
protection against data loss if two drives fail simultaneously.


If you have a RAID5 set with a hot spare, and you lose two drives, then 
you have data loss. If the same happens to a RAID6 set, then there is no 
data loss.


Matthew

--
And the lexer will say Oh look, there's a null string. Oooh, there's 
another. And another., and will fall over spectacularly when it realises

there are actually rather a lot.
- Computer Science Lecturer (edited)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Sun, 25 Jul 2010, Yeb Havinga wrote:
Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Does your latency graph really have milliseconds as the y axis? If so, 
this device is really slow - some requests have a latency of more than a 
second!


Matthew

--
The early bird gets the worm, but the second mouse gets the cheese.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Mon, 26 Jul 2010, Greg Smith wrote:

Matthew Wakeling wrote:
Does your latency graph really have milliseconds as the y axis? If so, this 
device is really slow - some requests have a latency of more than a second!


Have you tried that yourself?  If you generate one of those with standard 
hard drives and a BBWC under Linux, I expect you'll discover those latencies 
to be 5 seconds long.  I recently saw 100 *seconds* running a large pgbench 
test due to latency flushing things to disk, on a system with 72GB of RAM. 
Takes a long time to flush 3GB of random I/O out to disk when the kernel 
will happily cache that many writes until checkpoint time.


Apologies, I was interpreting the graph as the latency of the device, not 
all the layers in-between as well. There isn't any indication in the email 
with the graph as to what the test conditions or software are. Obviously 
if you factor in checkpoints and the OS writing out everything, then you 
would have to expect some large latency operations. However, if the device 
itself behaved as in the graph, I would be most unhappy and send it back.


Yeb also made the point - there are far too many points on that graph to 
really tell what the average latency is. It'd be instructive to have a few 
figures, like only x% of requests took longer than y.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Matthew Wakeling

On Sat, 10 Jul 2010, Tom Lane wrote:

Doesn't pgpool do this?


No, and in fact that's exactly why the proposed implementation isn't
ever going to be in core: it's not possible to do it portably.


I'm surprised. Doesn't apache httpd do this? Does it have to do a whole 
load of non-portable stuff? It seems to work on a whole load of platforms.


Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.-- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Matthew Wakeling

On Fri, 9 Jul 2010, Kevin Grittner wrote:

Any thoughts on the minimalist solution I suggested a couple weeks
ago?:

http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php

So far, there has been no comment by anyone


Interesting idea. As far as I can see, you are suggesting solving the too 
many connections problem by allowing lots of connections, but only 
allowing a certain number to do anything at a time?


A proper connection pool provides the following advantages over this:

1. Pool can be on a separate machine or machines, spreading load.
2. Pool has a lightweight footprint per connection, whereas Postgres
   doesn't.
3. A large amount of the overhead is sometimes connection setup, which
   this would not solve. A pool has cheap setup.
4. This could cause Postgres backends to be holding onto large amounts of
   memory while being prevented from doing anything, which is a bad use of
   resources.
5. A fair amount of the overhead is caused by context-switching between
   backends. The more backends, the less useful any CPU caches.
6. There are some internal workings of Postgres that involve keeping all
   the backends informed about something going on. The more backends, the
   greater this overhead is. (This was pretty bad with the sinval queue
   overflowing a while back, but a bit better now. It still causes some
   overhead).
7. That lock would have a metric *($!-load of contention.

Matthew

--
Unfortunately, university regulations probably prohibit me from eating
small children in front of the lecture class.
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Matthew Wakeling

On Fri, 9 Jul 2010, Kevin Grittner wrote:

Interesting idea. As far as I can see, you are suggesting solving
the too many connections problem by allowing lots of connections,
but only allowing a certain number to do anything at a time?


Right.


I think in some situations, this arrangement would be an advantage. 
However, I do not think it will suit the majority of situations, and could 
reduce the performance when the user doesn't need the functionality, 
either because they have a pool already, or they don't have many 
connections.


No, I don't have any numbers.


1. Pool can be on a separate machine or machines, spreading load.


Sure, but how would you do that with a built-in implementation?


That's my point exactly. If you have an external pool solution, you can 
put it somewhere else - maybe on multiple somewhere elses.



3. A large amount of the overhead is sometimes connection setup,
   which this would not solve. A pool has cheap setup.


This would probably be most useful where the client held a
connection for a long time, not for the login for each database
transaction approach.  I'm curious how often you think application
software uses that approach.


What you say is true. I don't know how often that is, but it seems to be 
those times that people come crying to the mailing list.



4. This could cause Postgres backends to be holding onto large
   amounts of memory while being prevented from doing anything,
   which is a bad use of resources.


Isn't this point 2 again?


Kind of. Yes. Point 2 was simple overhead. This point was that the backend 
may have done a load of query-related allocation, and then been stopped.



7. That lock would have a metric *($!-load of contention.


Here I doubt you.  It would be held for such short periods that I
suspect that collisions would be relatively infrequent compared to
some of the other locks we use.  As noted in the email, it may
actually normally be an increment and test within an existing
locked block.


Fair enough. It may be much less of a problem than I had previously 
thought.


Matthew

--
Change is inevitable, except from vending machines.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two different execution plan for the same request

2010-07-07 Thread Matthew Wakeling

On Wed, 7 Jul 2010, JOUANIN Nicolas (44) wrote:

It seems to work fine (same execution plan and less duration) after :
 - setting default_statistics_target to 100
 - full vacuum with analyze


Don't do VACUUM FULL.

Matthew

--
I suppose some of you have done a Continuous Maths course. Yes? Continuous
Maths? menacing stares from audience Whoah, it was like that, was it!
   -- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer

2010-07-05 Thread Matthew Wakeling

On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer cr...@postnewspapers.com.auwrote:

Yeah, if you're in a weird virtualized environment like that you're
likely to have problems...


On Sat, 3 Jul 2010, Rajesh Kumar Mallah wrote:

Thanks for thinking about it.I do not understand why u feel OpenVz is weird.
at the most its not very popular.


It's not OpenVz that is wierd, but virtualisation in general. If you are 
running in a virtual machine, then all sorts of things will not run as 
well as expected.


Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are. -- Kyle Hearn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-02 Thread Matthew Wakeling

On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:

I'm guessing from tea leaves, but the impression I got from Eliot's
description is that he's using plpgsql functions as sort comparators.
It's not surprising that that sucks performance-wise compared to having
the equivalent logic in C/C++ functions used as comparators on the
client side.  plpgsql is no speed demon.  Best fix might be to code the
comparators as C functions on the server side.


On Fri, 2 Jul 2010, Eliot Gable wrote:

I guess the real question is, is a generic C sorting function my only real
alternative?


Sounds to me like you are not really listening. You don't need to code an 
entire sorting algorithm in C, as Postgres already has a pretty good one 
of those. All you need to do is implement a comparator of some kind. 
Inserting C functions into Postgres is pretty easy, especially on the 
level of comparators.


Matthew

--
For those of you who are into writing programs that are as obscure and
complicated as possible, there are opportunities for... real fun here
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ideal storage configuration

2010-06-30 Thread Matthew Wakeling

On Tue, 29 Jun 2010, Samuel Gendler wrote:
The copy statements execute in a small fraction of the minute in which 
they occur.


I'm going to ask a silly question here. If the system is already coping 
quite well with the load, then why are you changing it?


All old data gets removed by dropping older partitions.  There are no 
updates at all.


That's good.


6 internal drives on battery backed raid (I don't know what RAID level
- is there a way to discover this?), all in a single filesystem, so
WAL and data are on the same filesystem.  I don't believe that we are
taking advantage of the battery backed controller, since I only see
this in /etc/fstab:

UUID=12dcd71d-8aec-4253-815c-b4883195eeb8 /   ext3
  defaults1 1


That doesn't have anything to do with whether or not the controller has a 
BBU cache. If the controller does have a BBU cache, then your writes will 
return quicker - and nothing else.



But inserts are happening so rapidly that I don't imagine that getting
rid of fsync is going to change performance of the reporting queries
too dramatically.


Don't get rid of fsync, unless you want to lose your data. Especially with 
your workload of large transactions, you do not need the benefit of 
reducing the transaction latency, and even that benefit is not present if 
you have a BBU cache.


It seems like your current disc array is coping quite well with the write 
traffic. If you really want to boost your read speeds for your reporting 
queries, then increase the amount of RAM, as Kevin said, and see if you 
can fit the active portion of the database into RAM.


Matthew

--
Nog: Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog: Frightening, isn't it?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WAL+Os on a single disk

2010-06-24 Thread Matthew Wakeling

On Wed, 23 Jun 2010, Scott Marlowe wrote:

We have a 12 x 600G hot swappable disk system (raid 10)
and 2 internal disk  ( 2x 146G)

Does it make sense to put the WAL and OS on the internal disks


So for us, the WAL and OS and logging on the same data set works well.


Generally, it is recommended that you put the WAL onto a separate disc to 
the data. However, in this case, I would be careful. It may be that the 12 
disc array is more capable. Specifically, it is likely that the 12-disc 
array has a battery backed cache, but the two internal drives (RAID 1 
presumably) do not. If this is the case, then putting the WAL on the 
internal drives will reduce performance, as you will only be able to 
commit a transaction once per revolution of the internal discs. In 
contrast, if the WAL is on a battery backed cache array, then you can 
commit much more frequently.


Test it and see.

Matthew

--
I don't want the truth. I want something I can tell parliament!
 -- Rt. Hon. Jim Hacker MP
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Write performance

2010-06-24 Thread Matthew Wakeling

On Thu, 24 Jun 2010, Janning wrote:

We have a 12 GB RAM machine with intel i7-975 and using
3 disks Seagate Barracuda 7200.11, ST31500341AS (1.5 GB)


Those discs are 1.5TB, not 1.5GB.


One disk for the system and WAL etc. and one SW RAID-0 with two disks for
postgresql data. Our database is about 24GB.


Beware of RAID-0 - make sure you can recover the data when (not if) a disc 
fails.



Our munin graph reports at 9:00 a clock writes of 3000 blocks per second and
reads of about 1000 blocks per second on our disk which holds the data
directories of postgresql (WAL are on a different disk)

3000 blocks ~ about 3 MB/s write
1000 blocks ~ about 1 MB/s read

At the same time we have nearly 50% CPU I/O wait and only 12% user CPU load
(so 4 of 8 cpu cores are in use for io wait)


Not quite sure what situation you are measuring these figures under. 
However, as a typical figure, let's say you are doing random access with 
8kB blocks (as in Postgres), and the access time on your drive is 8.5ms 
(as with these drives).


For each drive, you will be able to read/write approximately 8kB / 
0.0085s, giving 941kB per second. If you have multiple processes all doing 
random access, then you may be able to utilise both discs and get double 
that.



Of course, writing large chunks is quite a different usage pattern. But I am
wondering that writing 3MB/s and reading 1 MB/s seams to be a limit if i can
run a test with 89 MB/s writing and 110MB/s reading.


That's quite right, and typical performance figures for a drive like that.

Matthew

--
Don't criticise a man until you have walked a mile in his shoes; and if
you do at least he will be a mile behind you and bare footed.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] raid10 write performance

2010-06-23 Thread Matthew Wakeling

On Wed, 23 Jun 2010, Ivan Voras wrote:

On 06/23/10 14:00, Florian Weimer wrote:

Barrier support on RAID10 seems to require some smallish amount of
non-volatile storage which supports a high number of write operations
per second, so a software-only solution might not be available.


If I understand you correctly, this can be said in general for all
spinning-disk usage and is not specific to RAID10. (And in the case of
high, constant TPS, no amount of NVRAM will help you).


No. Write barriers work fine with a single disc, assuming it is set up 
correctly. The barrier is a command telling the disc to make sure that one 
piece of data is safe before starting to write another piece of data.


However, as soon as you have multiple discs, the individual discs do not 
have a way of communicating with each other to make sure that the first 
piece of data is written before the other. That's why you need a little 
bit of non-volatile storage to mediate that to properly support barriers.


Of course, from a performance point of view, yes, you need some NVRAM on 
any kind of spinning storage to maintain high commit rates.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Matthew Wakeling

Dimitri Fontaine wrote:

Well I guess I'd prefer a per-transaction setting


Not possible, as many others have said. As soon as you make an unsafe 
transaction, all the other transactions have nothing to rely on.


On Thu, 17 Jun 2010, Pierre C wrote:
A per-table (or per-index) setting makes more sense IMHO. For instance on 
recovery, truncate this table (this was mentioned before).


That would be much more valuable.

I'd like to point out the costs involved in having a whole separate 
version of Postgres that has all this safety switched off. Package 
managers will not thank anyone for having to distribute another version of 
the system, and woe betide the user who installs the wrong version because 
it runs faster. No, this is much better as a configurable option.


Going back to the on recovery, truncate this table. We already have a 
mechanism for skipping the WAL writes on an entire table - we do that for 
tables that have been created in the current transaction. It would surely 
be a small step to allow this to be configurably permanent on a particular 
table.


Moreover, we already have a mechanism for taking a table that has had 
non-logged changes, and turning it into a fully logged table - we do that 
to the above mentioned tables when the transaction commits. I would 
strongly recommend providing an option to ALTER TABLE MAKE SAFE, which may 
involve some more acrobatics if the table is currently in use by multiple 
transactions, but would be valuable.


This would allow users to create temporary tables that can be shared by 
several connections. It would also allow bulk loading in parallel of a 
single large table.


With these suggestions, we would still need to WAL-log all the metadata 
changes, but I think in most circumstances that is not going to be a large 
burden on performance.


Matthew

--
Picard: I was just paid a visit from Q.
Riker:  Q! Any idea what he's up to?
Picard: No. He said he wanted to be nice to me.
Riker:  I'll alert the crew.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-18 Thread Matthew Wakeling

On Fri, 18 Jun 2010, Robert Haas wrote:

On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling matt...@flymine.org wrote:

Absolutely, and I said in
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps.


This is an interesting idea.  I would guess that you could simulate
this to some degree by compiling PG with a larger block size.  Have
you tried this to see whether/how much/for what kind of workloads it
helps?


To a degree, that is the case. However, if you follow the thread a bit 
further back, you will find evidence that when the index is in memory, 
increasing the page size actually decreases the performance, because it 
uses more CPU.


To make it clear - 8kB is not an optimal page size for either fully cached 
data or sparsely cached data. For disc access, large pages are 
appropriate, on the order of 256kB. If the page size is much lower than 
that, then the time taken to fetch it doesn't actually decrease much, and 
we are trying to get the maximum amount of work done per fetch without 
slowing fetches down significantly.


Given such a large page size, it would then be appropriate to have a 
better data structure inside the page. Currently, our indexes (at least 
the GiST ones - I haven't looked at the Btree ones) use a simple linear 
array in the index page. Using a proper tree inside the index page would 
improve the CPU usage of the index lookups.


One detail that would need to be sorted out is the cache eviction policy. 
I don't know if it is best to evict whole 256kB pages, or to evict 8kB 
pages. Probably the former, which would involve quite a bit of change to 
the shared memory cache. I can see the cache efficiency decreasing as a 
result of this, which is the only disadvantage I can see.


This sort of thing has been fairly well researched at an academic level, 
but has not been implemented in that many real world situations. I would 
encourage its use in Postgres.


Matthew

--
Failure is not an option. It comes bundled with your Microsoft product. 
-- Ferenc Mantfeld


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parallel queries for a web-application |performance testing

2010-06-17 Thread Matthew Wakeling

On Wed, 16 Jun 2010, Balkrishna Sharma wrote:
Hello,I will have a web application having postgres 8.4+ as backend. At 
any given time, there will be max of 1000 parallel web-users interacting 
with the database (read/write)I wish to do performance testing of 1000 
simultaneous read/write to the database.


When you set up a server that has high throughput requirements, the last 
thing you want to do is use it in a manner that cripples its throughput. 
Don't try and have 1000 parallel Postgres backends - it will process those 
queries slower than the optimal setup. You should aim to have 
approximately ((2 * cpu core count) + effective spindle count) number of 
backends, as that is the point at which throughput is the greatest. You 
can use pgbouncer to achieve this.


I can do a simple unix script on the postgres server and have parallel 
updates fired for example with an ampersand at the end. Example:


echo '\timing \\update DAPP.emp_data set f1 = 123where emp_id =0;' | 
psql test1 postgres|grep Time:|cut -d' ' -f2-  
/home/user/Documents/temp/logs/$NUM.txt pid1=$!  echo '\timing 
\\update DAPP.emp_data set f1 = 123 where emp_id =2;' | psql test1 
postgres|grep Time:|cut -d' ' -f2-  
/home/user/Documents/temp/logs/$NUM.txt pid2=$!  echo '\timing 
\\update DAPP.emp_data set f1 = 123 where emp_id =4;' | psql test1 
postgres|grep Time:|cut -d' ' -f2-  
/home/user/Documents/temp/logs/$NUM.txt pid3=$!  .


Don't do that. The overhead of starting up an echo, a psql, and a grep 
will limit the rate at which these queries can be fired at Postgres, and 
consume quite a lot of CPU. Use a proper benchmarking tool, possibly on a 
different server.


Also, you should be using a different username to postgres - that one is 
kind of reserved for superuser operations.


Matthew

--
People who love sausages, respect the law, and work with IT standards 
shouldn't watch any of them being made.  -- Peter Gutmann


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] B-Heaps

2010-06-15 Thread Matthew Wakeling

On Mon, 14 Jun 2010, Eliot Gable wrote:

Just curious if this would apply to PostgreSQL:
http://queue.acm.org/detail.cfm?id=1814327


Absolutely, and I said in 
http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php
but applied to the Postgres B-tree indexes instead of heaps. It's a pretty 
obvious performance improvement really - the principle is that when you do 
have to fetch a page from a slower medium, you may as well make it count 
for a lot.


Lots of research has already been done on this - the paper linked above is 
rather behind the times.


However, AFAIK, Postgres has not implemented this in any of its indexing 
systems.


Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-11 Thread Matthew Wakeling

On Fri, 11 Jun 2010, Kenneth Marshall wrote:

If you check the archives, you will see that this is not easy
to do because of the effects of caching.


Indeed. If you were to take the value at completely face value, a modern 
hard drive is capable of transferring sequential pages somewhere between 
40 and 100 times faster than random pages, depending on the drive.


However, caches tend to favour index scans much more than sequential 
scans, so using a value between 40 and 100 would discourage Postgres from 
using indexes when they are really the most appropriate option.


Matthew

--
A. Top Posters
 Q. What's the most annoying thing in the world?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query

2010-06-04 Thread Matthew Wakeling

On Thu, 3 Jun 2010, Anj Adu wrote:

http://explain.depesz.com/s/kHa


I'm interested in why the two partitions dev4_act_dy_fact and 
dev4_act_dy_fact_2010_05_t3 are treated so differently. I'm guessing that 
the former is the parent and the latter the child table?


When accessing the parent table, Postgres is able to use a bitmap AND 
index scan, because it has the two indexes dev4_act_dy_dm_nd_indx and 
dev4_act_dy_dm_cd_indx. Do the child tables have a similar index setup?


Incidentally, you could get even better than a bitmap AND index scan by 
creating an index on (node_id, thedate) on each table.



random_page_cost=1


I agree with Tomas that this is rarely a useful setting.

Matthew

--
You can configure Windows, but don't ask me how.   -- Bill Gates

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-04 Thread Matthew Wakeling

On Thu, 3 Jun 2010, Greg Smith wrote:
And it's also quite reasonable for a RAID controller to respond to that 
flush the whole cache call by flushing its cache.


Remember that the RAID controller is presenting itself to the OS as a 
large disc, and hiding the individual discs from the OS. Why should the OS 
care what has actually happened to the individual discs' caches, as long 
as that flush the whole cache command guarantees that the data is 
persistent. Taking the RAID array as a whole, that happens when the data 
hits the write-back cache.


The only circumstance where you actually need to flush the data to the 
individual discs is when you need to take that disc away somewhere else 
and read it on another system. That's quite a rare use case for a RAID 
array (http://thedailywtf.com/Articles/RAIDing_Disks.aspx 
notwithstanding).


If the controller had some logic that said it's OK to not flush the 
cache when that call comes in if my battery is working fine, that would 
make this whole problem go away.


The only place this can be properly sorted is the RAID controller. 
Anywhere else would be crazy.


Matthew

--
To err is human; to really louse things up requires root
privileges. -- Alexander Pope, slightly paraphrased

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

2010-06-03 Thread Matthew Wakeling

On Wed, 2 Jun 2010, Jori Jovanovich wrote:

(2) Making the query faster by making the string match LESS specific (odd,
seems like it should be MORE)


No, that's the way round it should be. The LIMIT changes it all. Consider 
if you have a huge table, and half of the entries match your WHERE clause. 
To fetch the ORDER BY ... LIMIT 20 using an index scan would involve 
accessing only on average 40 entries from the table referenced by the 
index. Therefore, the index is quick. However, consider a huge table that 
only has twenty matching entries. The index scan would need to touch every 
single row in the table to return the matching rows, so a sequential scan, 
filter, and sort would be much faster. Of course, if you had an index 
capable of answering the WHERE clause, that would be even better for that 
case.


Matthew

--
Don't criticise a man until you have walked a mile in his shoes; and if
you do at least he will be a mile behind you and bare footed.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Matthew Wakeling

On Thu, 3 Jun 2010, Craig James wrote:

Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  barrier=1 isn't just a bad idea on ext4, it's a 
disaster.


This worries me a little. Does your array have a battery-backed cache? If 
so, then it should be fast regardless of barriers (although barriers may 
make a small difference). If it does not, then it is likely that the fast 
speed you are seeing with barriers off is unsafe.


There should be no just missed the sector going past for write problem 
ever with a battery-backed cache.


Matthew

--
There once was a limerick .sig
that really was not very big
It was going quite fine
Till it reached the fourth line

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Overusing 1 CPU

2010-06-02 Thread Matthew Wakeling

On Wed, 2 Jun 2010, Mozzi wrote:

This box is basically adle @ the moment as it is still in testing yet
top shows high usage on just 1 of the cores.


First port of call: What process is using the CPU? Run top on a fairly 
wide terminal and use the c button to show the full command line.


Matthew

--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.  - Kernighan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Matthew Wakeling

On Sun, 23 May 2010, David Jarvis wrote:

The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were causing the
planner to do a full table scan, which is correct.


I wonder if you might see some benefit from CLUSTERing the tables on the 
index.


Matthew

--
And the lexer will say Oh look, there's a null string. Oooh, there's 
another. And another., and will fall over spectacularly when it realises

there are actually rather a lot.
- Computer Science Lecturer (edited)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling

On Fri, 28 May 2010, Merlin Moncure wrote:

At best, if you are a ninja with the marginally documented backend
api, you will create code that goes about as fast as your pl/pgsql
function for 10 times the amount of input work, unless there are heavy
amounts of 'other than sql' code in your function.  The reason to
write C in the backend is:

*) Interface w/3rd party libraries w/C linkage
*) Do things that are illegal in regular SQL (write files, etc)
*) Make custom types


The major case I found when writing pl/pgsql was when trying to build 
arrays row by row. AFAIK when I tried it, adding a row to an array caused 
the whole array to be copied, which put a bit of a damper on performance.


Matthew

--
The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow words;
on occasion, English has pursued other languages down alleyways to beat
them unconscious and rifle their pockets for new vocabulary. - James Nicoll

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling

On Tue, 1 Jun 2010, Stephen Frost wrote:

* Matthew Wakeling (matt...@flymine.org) wrote:

The major case I found when writing pl/pgsql was when trying to build
arrays row by row. AFAIK when I tried it, adding a row to an array caused
the whole array to be copied, which put a bit of a damper on performance.


Using the built-ins now available in 8.4 (array_agg), that copying
doesn't happen any more.


Thanks. I had wondered if that had been improved.

Matthew

--
Our riverbanks and seashores have a beauty all can share, provided
there's at least one boot, three treadless tyres, a half-eaten pork
pie, some oil drums, an old felt hat, a lorry-load of tar blocks,
and a broken bedstead there.   -- Flanders and Swann

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling

On Fri, 21 May 2010, Yeb Havinga wrote:

For time based data I would for sure go for year based indexing.


On the contrary, most of the queries seem to be over many years, but 
rather restricting on the time of year. Therefore, partitioning by month 
or some other per-year method would seem sensible.


Regarding the leap year problem, you might consider creating a modified 
day of year field, which always assumes that the year contains a leap day. 
Then a given number always resolves to a given date, regardless of year. 
If you then partition (or index) on that field, then you may get a 
benefit.


In this case, partitioning is only really useful when you are going to be 
forced to do seq scans. If you can get a suitably selective index, in the 
case where you are selecting a small proportion of the data, then I would 
concentrate on getting the index right, rather than the partition, and 
maybe even not do partitioning.


Matthew

--
Trying to write a program that can't be written is... well, it can be an
enormous amount of fun! -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
Regarding the leap year problem, you might consider creating a modified day 
of year field, which always assumes that the year contains a leap day. Then 
a given number always resolves to a given date, regardless of year. If you 
then partition (or index) on that field, then you may get a benefit.

On Fri, 21 May 2010, Yeb Havinga wrote:
Shouldn't it be just the other way around - assume all years are non leap 
years for the doy part field to be indexed.


The mapping doesn't matter massively, as long as all days of the year can 
be mapped uniquely onto a number, and the numbers are sequential. Your 
suggestion does not satisfy the first of those two requirements.


If you assume that all yeasr are leap years, then you merely skip a number 
in the middle of the year, which isn't a problem when you want to check 
for days between two bounds. However, if you assume non leap year, then 
there is no representation for the 29th of February, so not all data 
points will have a representative number to insert into the database.


Matthew

--
No, C++ isn't equal to D. 'C' is undeclared, so we assume it's an int,
with a default value of zero.  Hence, C++ should really be called 1.
-- met24, commenting on the quote C++ -- shouldn't it be called D?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Matthew Wakeling

On Fri, 21 May 2010, Richard Yen wrote:

Any ideas why the query planner chooses a different query plan when using 
prepared statements?


This is a FAQ. Preparing a statement makes Postgres create a plan, without 
knowing the values that you will plug in, so it will not be as optimal as 
if the values were available. The whole idea is to avoid the planning cost 
each time the query is executed, but if your data is unusual it can 
result in worse plans.


Matthew

--
Existence is a convenient concept to designate all of the files that an
executable program can potentially process.   -- Fortran77 standard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling

On Wed, 19 May 2010, David Jarvis wrote:

 extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND


That portion of the WHERE clause cannot use an index on m.taken. Postgres 
does not look inside functions (like extract) to see if something 
indexable is present. To get an index to work, you could create an index 
on (extract(YEAR FROM m.taken)).


Matthew

--
Here we go - the Fairy Godmother redundancy proof.
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling

On Thu, 20 May 2010, David Jarvis wrote:

I took out the date conditions:

SELECT
 m.*
FROM
 climate.measurement m
WHERE
 m.category_id = 1 and
 m.station_id = 2043

This uses the station indexes:


Yes, because there is only one station_id selected. That's exactly what an 
index is for.



Then combined the selection of the station:
The station index is no longer used, resulting in full table scans:



Nested Loop  (cost=0.00..994.94 rows=4046 width=4)
  Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double
precision), 2::double precision) + (cos((radians(((c.latitude_decimal +
s.latitude_decimal))::double precision) / 2::double precision)) *
pow(radians(((c.longitude_decimal - s.longitude_decimal))::double
precision), 2::double precision) = 25::double precision)
  -  Index Scan using city_pkey1 on city c  (cost=0.00..6.27 rows=1
width=16)
Index Cond: (id = 5182)
  -  Seq Scan on station s  (cost=0.00..321.08 rows=12138 width=20)
Filter: ((s.elevation = 0) AND (s.elevation = 3000))

I get a set of 78 rows returned in very little time.


(An EXPLAIN ANALYSE would be better here). Look at the expected number of 
stations returned. It expects 4046 which is a large proportion of the 
available stations. It therefore expects to have to touch a large 
proportion of the measurement table, therefore it thinks that it will be 
fastest to do a seq scan. In actual fact, for 78 stations, the index would 
be faster, but for 4046 it wouldn't.


If you will be querying by season quite regularly, had you considered 
partitioning by season?


Matthew

--
Geography is going places.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] merge join killing performance

2010-05-19 Thread Matthew Wakeling

On Wed, 19 May 2010, Scott Marlowe wrote:

It's apparently estimating (wrongly) that the merge join won't have to
scan very much of files before it can stop because it finds an eid
value larger than any eid in the other table.  So the issue here is an
inexact stats value for the max eid.


I wandered if it could be something like that, but I rejected that idea, 
as it obviously wasn't the real world case, and statistics should at least 
get that right, if they are up to date.



I changed stats target to 1000 for that field and still get the bad plan.


What do the stats say the max values are?

Matthew

--
Nog: Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog: Frightening, isn't it?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] merge join killing performance

2010-05-18 Thread Matthew Wakeling

On Tue, 18 May 2010, Scott Marlowe wrote:

Aggregate  (cost=902.41..902.42 rows=1 width=4)
  -  Merge Join  (cost=869.97..902.40 rows=1 width=4)
Merge Cond: (f.eid = ev.eid)
-  Index Scan using files_eid_idx on files f
(cost=0.00..157830.39 rows=3769434 width=8)


Okay, that's weird. How is the cost of the merge join only 902, when the 
cost of one of the branches 157830, when there is no LIMIT?


Are the statistics up to date?

Matthew

--
As you approach the airport, you see a sign saying Beware - low
flying airplanes. There's not a lot you can do about that. Take 
your hat off?  -- Michael Flanders


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner not using column limit specified for one column for another column equal to first

2010-04-19 Thread Matthew Wakeling

On Sat, 17 Apr 2010, Віталій Тимчишин wrote:

As of making planner more clever, may be it is possible to introduce
division on fast queries and long queries, so that if after fast
planning cost is greater then some configurable threshold, advanced planning
techniques (or settings) are used. As far as I have seen in this list, many
techniques are not used simply because they are too complex and could make
planning take too much time for really fast queries, but they are vital for
long ones.


+1. That's definitely a good idea in my view. The query optimiser I wrote 
(which sits on top of Postgres and makes use of materialised views to 
speed up queries) uses a similar approach - it expends effort proportional 
to the estimated cost of the query, as reported by EXPLAIN.


Matthew

--
To most people, solutions mean finding the answers. But to chemists,
solutions are things that are still all mixed up.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT causes planner to do Index Scan using a less optimal index

2010-04-07 Thread Matthew Wakeling

On Wed, 7 Apr 2010, sherry.ctr@faa.gov wrote:

Please just let me know if Postgres can do this kind of index or not.

create index idx1 on tb1(col1, col2)

Then later we can find it is useful or useless.


Have you tried it?

Grzegorz Jaśkiewicz gryz...@gmail.com wrote: 

something like this:   create index idx1 on tb1(col1, col2);
yup :)


For those of you who are not native English speakers, Yup is a synonym 
for Yes.


Matthew

--
Richards' Laws of Data Security:
1. Don't buy a computer.
2. If you must buy a computer, don't turn it on.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] experiments in query optimization

2010-03-31 Thread Matthew Wakeling

On Tue, 30 Mar 2010, Faheem Mitha wrote:

work_mem = 1 GB (see diag.{tex/pdf}).


Sure, but define sane setting, please. I guess part of the point is that I'm 
trying to keep memory low


You're trying to keep memory usage low, but you have work_mem set to 1GB?

Matthew

--
Prove to thyself that all circuits that radiateth and upon which thou worketh
are grounded, lest they lift thee to high-frequency potential and cause thee
to radiate also.  -- The Ten Commandments of Electronics

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance regarding LIKE searches

2010-03-30 Thread Matthew Wakeling

On Mon, 29 Mar 2010, randa...@bioinfo.wsu.edu wrote:

WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ...


I'm sure you noticed that this is never going to return any rows?

Matthew

--
Me... a skeptic?  I trust you have proof?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizer showing wrong rows in plan

2010-03-29 Thread Matthew Wakeling

On Mon, 29 Mar 2010, Tadipathri Raghu wrote:

As per the documentation, one page is 8kb, when i create a table with int as
one column its 4 bytes. If i insert 2000 rows, it should be in one page only
as its 8kb, but its extending vastly as expected. Example shown below,
taking the previous example table test with one column.


There is more to a row than just the single int column. The space used by 
a column will include a column start marker (data length), transaction 
ids, hint bits, an oid, a description of the types of the columns, and 
finally your data columns. That takes a bit more space.


Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] memory question

2010-03-25 Thread Matthew Wakeling

On Wed, 24 Mar 2010, Campbell, Lance wrote:

I have 24 Gig of memory on my server...

Our server manager seems to think that I have way to much memory.  He
thinks that we only need 5 Gig.


You organisation probably spent more money getting your server manager to 
investigate how much RAM you need and scaring you about wasting resources, 
than it would cost to just slap 24GB in the machine.


24GB is the least amount of RAM I would consider putting in a new server 
nowadays. It's so cheap.


Matthew

--
Lord grant me patience, and I want it NOW!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes

2010-03-22 Thread Matthew Wakeling

On Fri, 19 Mar 2010, Stephen Frost wrote:
...it has to go to an external on-disk sort (see later on, and how to 
fix that).


This was covered on this list a few months ago, in 
http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and 
http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php


There seemed to be some consensus that allowing a materialise in front of 
an index scan might have been a good change. Was there any movement on 
this front?



Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
time=7413.489..7413.489 rows=1 loops=1)
  -  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
(actual time=3705.078..7344.256 rows=101 loops=1)
Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
f2.user_id))
-  Index Scan using user_ref on friends f1
(cost=0.00..26097.86 rows=2818347 width=139) (actual
time=0.093..1222.592 rows=1917360 loops=1)
-  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3704.977..5043.347 rows=1990148 loops=1)
  -  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3704.970..4710.703 rows=1990148 loops=1)
Sort Key: f2.ref_id, f2.user_id
Sort Method:  external merge  Disk: 49576kB
-  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)
Total runtime: 7422.516 ms



If you had an index on ref_id,user_id (as well as the one on
user_id,ref_id), it'd probably be able to do in-order index traversals
on both and be really fast...  But then updates would be more expensive,
of course, since it'd have more indexes to maintain.


That isn't necessarily so, until the issue referred to in the above linked 
messages is resolved. It depends.


Matthew

--
I've run DOOM more in the last few days than I have the last few
months.  I just love debugging ;-)  -- Linus Torvalds

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling

On Sat, 20 Mar 2010, Yeb Havinga wrote:

The gist virtual pages would then match more the original blocksizes that
were used in Guttman's R-tree paper (first google result, then figure 4.5).
Since the nature/characteristics of the underlying datatypes and keys is not
changed, it might be that with the disk pages getting larger, gist indexing
has therefore become unexpectedly inefficient.


Yes, that is certainly a factor. For example, the page size for bioseg 
which we use here is 130 entries, which is very excessive, and doesn't 
allow very deep trees. On the other hand, it means that a single disc seek 
performs quite a lot of work.



But I am also not really into the core-gist code, but do have a motivation
to dive into it (more than 200% performance increase in Mathew's test case).
However I'd like to verify for community support before working on it.


I'd also love to dive into the core gist code, but am rather daunted by 
it. I believe that there is something there that is taking more time than 
I can account for. The indexing algorithm itself is good.


Matthew

--
The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow words;
on occasion, English has pursued other languages down alleyways to beat
them unconscious and rifle their pockets for new vocabulary. - James Nicoll

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling

On Mon, 22 Mar 2010, Yeb Havinga wrote:
Yes, that is certainly a factor. For example, the page size for bioseg 
which we use here is 130 entries, which is very excessive, and doesn't 
allow very deep trees. On the other hand, it means that a single disc seek 
performs quite a lot of work.


Yeah, I only did in-memory fitting tests and wondered about increased io's. 
However I bet that even for bigger than ram db's, the benefit of having to 
fan out to less pages still outweighs the over-general non leaf nodes and 
might still result in less disk io's. I redid some earlier benchmarking with 
other datatypes with a 1kB block size and also multicolumn gist and the 
multicolumn variant had an ever greater benefit than the single column 
indexes, both equality and range scans. (Like execution times down to 20% of 
original). If gist is important to you, I really recommend doing a test with 
1kB blocks.


Purely from a disc seek count point of view, assuming an infinite CPU 
speed and infinite disc transfer rate, the larger the index pages the 
better. The number of seeks per fetch will be equivalent to the depth of 
the tree.


If you take disc transfer rate into account, the break-even point is when 
you spend an equal time transferring as seeking, which places the page 
size around 500kB on a modern disc, assuming RAID stripe alignment doesn't 
make that into two seeks instead of one.


However, for efficient CPU usage, the ideal page size for a tree index is 
much smaller - between two and ten entries, depending on the type of the 
data.


There may be some mileage in reorganising indexes into a two-level system. 
That is, have an index format where the page size is 512kB or similar, but 
each page is internally a CPU-efficient tree itself.


However, this is beyond the scope of the problem of speeding up gist.

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump far too slow

2010-03-18 Thread Matthew Wakeling

On Sun, 14 Mar 2010, David Newall wrote:

  nohup time pg_dump -f database.dmp -Z9 database

I presumed pg_dump was CPU-bound because of gzip compression, but a test I 
ran makes that seem unlikely...


There was some discussion about this a few months ago at 
http://archives.postgresql.org/pgsql-performance/2009-07/msg00348.php


It seems that getting pg_dump to do the compression is a fair amount 
slower than piping the plain format dump straight through gzip. You get a 
bit more parallelism that way too.


Matthew


--
I'm always interested when [cold callers] try to flog conservatories.
Anyone who can actually attach a conservatory to a fourth floor flat
stands a marginally better than average chance of winning my custom.
(Seen on Usenet)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting bytea, autovacuum, and 8.2/8.4 differences

2010-03-15 Thread Matthew Wakeling

On Mon, 15 Mar 2010, Tom Lane wrote:

For an example like this one, you have to keep in mind that the
toast-table rows for the large bytea value have to be marked deleted,
too.  Also, since I/O happens in units of pages, the I/O volume to
delete a tuple is just as much as the I/O to create it.  (The WAL
entry for deletion might be smaller, but that's all.)  So it is entirely
unsurprising that DELETE FROM foo is about as expensive as filling the
table initially.

If deleting a whole table is significant for you performance-wise,
you might look into using TRUNCATE instead.


What are the implications of using TRUNCATE on a table that has TOASTed 
data? Is TOAST all stored in one single table, or is it split up by owner 
table/column name? Might you still end up with a normal delete operation 
on the TOAST table when performing a TRUNCATE on the owner table?


Matthew

--
sed -e '/^[when][coders]/!d;/^...[discover].$/d;/^..[real].[code]$/!d
' `locate dict/words`

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-15 Thread Matthew Wakeling

On Thu, 25 Feb 2010, Bruce Momjian wrote:

Was there every any conclusion on this issue?


Not really. Comments inline:


Matthew Wakeling wrote:

Revisiting the thread a month back or so, I'm still investigating
performance problems with GiST indexes in Postgres.

Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
pathological behaviour which is fixed by my second patch, which can be
viewed as complete. Contrib/cube, being multi-dimensional, is not affected
to any significant degree, so should not need alteration.


This issue is addressed by my patch, which AFAIK noone has reviewed. 
However, that patch was derived from a patch that I applied to bioseg, 
which is itself a derivative of seg. This patch works very well indeed, 
and gave an approximate 100 times speed improvement in the one test I ran.


So you could say that the sister patch of the one I submitted is tried and 
tested in production.



A second quite distinct issue is the general performance of GiST indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for
btree_gist indexes. I have a similar example with the bioseg GiST index. I
have completely reimplemented the same algorithms in Java for algorithm
investigation and instrumentation purposes, and it runs about a hundred
times faster than in Postgres. I think this is a problem, and I'm willing
to do some investigation to try and solve it.


I have not made any progress on this issue. I think Oleg and Teodor would 
be better placed working it out. All I can say is that I implemented the 
exact same indexing algorithm in Java, and it performed 100 times faster 
than Postgres. Now, Postgres has to do a lot of additional work, like 
mapping the index onto disc, locking pages, and abstracting to plugin user 
functions, so I would expect some difference - I'm not sure 100 times is 
reasonable though. I tried to do some profiling, but couldn't see any one 
section of code that was taking too much time. Not sure what I can further 
do.


Matthew

--
Some people, when confronted with a problem, think I know, I'll use regular
expressions. Now they have two problems.  -- Jamie Zawinski

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Dell PERC H700/H800

2010-02-11 Thread Matthew Wakeling


Just a heads up - apparently the more recent Dell RAID controllers will no 
longer recognise hard discs that weren't sold through Dell.


http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/

As one of the comments points out, that kind of makes them no longer SATA 
or SAS compatible, and they shouldn't be allowed to use those acronyms any 
more.


Matthew

--
An optimist sees the glass as half full, a pessimist as half empty,
and an engineer as having redundant storage capacity.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Postgresql server and FreeBSD for heavy read and writes

2010-02-04 Thread Matthew Wakeling

On Thu, 4 Feb 2010, Amitabh Kant wrote:

On Wed, Feb 3, 2010 at 10:05 PM, Ivan Voras ivo...@freebsd.org wrote:
If you can, add another 2 drives in RAID 1 and move+symlink the pg_xlog 
directory to the new array.


Can't do anything about this server now, but would surely keep in mind
before upgrading other servers. Would you recommend the same speed
drives(15K SAS) for RAID 1, or would a slower drive also work here (10K SAS
or even SATA II)?


The performance requirements for the WAL are significantly lower than for 
the main database. This is for two reasons - firstly the WAL is 
write-only, and has no other activity. The WAL only gets read again in the 
event of a crash. Secondly, writes to the WAL are sequential writes, which 
is the fastest mode of operation for a disc, whereas the main database 
discs will have to handle random access.


The main thing you need to make sure of is that the WAL is on a disc 
system that has a battery-backed up cache. That way, it will be able to 
handle the high rate of fsyncs that the WAL generates, and the cache will 
convert that into a simple sequential write. Otherwise, you will be 
limited to one fsync every 5ms (or whatever the access speed of your WAL 
discs is).


If you make sure of that, then there is no reason to get expensive fast 
discs for the WAL at all (assuming they are expensive enough to not lie 
about flushing writes properly).


Matthew

--
So, given 'D' is undeclared too, with a default of zero, C++ is equal to D.
 mnw21, commenting on the Surely the value of C++ is zero, but C is now 1
 response to No, C++ isn't equal to D. 'C' is undeclared [...] C++ should
 really be called 1 response to C++ -- shouldn't it be called D?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] System overload / context switching / oom, 8.3

2010-02-03 Thread Matthew Wakeling

On Tue, 2 Feb 2010, Rob wrote:

pg 8.3.9, Debian Etch, 8gb ram, quadcore xeon, megaraid (more details at end)
~240 active databases, 800+ db connections via tcp.



Linux 2.6.18-6-686-bigmem #1 SMP Thu Nov 5 17:30:05 UTC 2009 i686
GNU/Linux (Debian Etch)

8 MB RAM
4 Quad Core Intel(R) Xeon(R) CPU   E5440  @ 2.83GHz stepping 06


My advice?

1. Switch to 64-bit operating system and Postgres. Debian provides that, 
and it works a charm. You have a 64-bit system, so why not use it?


2. Buy more RAM. Think about it - you have 800 individual processes 
running on your box, and they will all want their own process space. To be 
honest, I'm impressed that the current machine works at all. You can get 
an idea of how much RAM you might need by multiplying the number of 
connections by (work_mem + about 3MB), and add on shared_buffers. So even 
when the system is idle you're currently burning 3200MB just sustaining 
800 processes - more if they are actually doing something.


3. Try to reduce the number of connections to the database server.

4. Think about your work_mem. Finding the correct value for you is going 
to be a matter of testing. Smaller values will result in large queries 
running slowly, but have the danger of driving the system to swap and OOM.


Matthew

--
A good programmer is one who looks both ways before crossing a one-way street.
Considering the quality and quantity of one-way streets in Cambridge, it
should be no surprise that there are so many good programmers there.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark shows very slow bulk delete

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Thom Brown wrote:

Had a quick look at a benchmark someone put together of MySQL vs PostgreSQL,
and while PostgreSQL is generally faster, I noticed the bulk delete was very
slow: http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html

Is this normal?


On the contrary, TRUNCATE TABLE is really rather fast.

Seriously, the Postgres developers, when designing the system, decided on 
a database layout that was optimised for the most common cases. Bulk 
deletion of data is not really that common an operation, unless you are 
deleting whole categories of data, where setting up partitioning and 
deleting whole partitions would be sensible.


Other complications are that the server has to maintain concurrent 
integrity - that is, another transaction must be able to see either none 
of the changes or all of them. As a consequence of this, Postgres needs to 
do a sequential scan through the table and mark the rows for deletion in 
the transaction, before flipping the transaction committed status and 
cleaning up afterwards.


I'd be interested in how mysql manages to delete a whole load of rows in 
0.02 seconds. How many rows is that?


(Reading in the comments, I saw this: The slow times for Postgresql Bulk 
Modify/Bulk Delete can be explained by foreign key references to the 
updates table. I'm not sure that fully explains it though, unless there 
are basically zero rows being deleted - it's hardly bulk then, is it?)


Matthew

--
People who love sausages, respect the law, and work with IT standards 
shouldn't watch any of them being made.  -- Peter Gutmann


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] test send (recommended by Dave Page)

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Mark Steben wrote:

Subject: [PERFORM] test send (recommended by Dave Page)

Hi all - sorry to create additional email 'noise'

But I've been trying to post a rather long query to

The pgsql-performance user list.  Dave thought

That it might have been bounced due to the length

And suggested I send a short 'blast'



If this works I'll send a shortened version of my query later.


Whatever you do, don't try to send an email to the list with the word 
help in the subject. The mailing list software will silently throw away 
your email. Helpful, for a help mailing list.


Matthew

--
The early bird gets the worm, but the second mouse gets the cheese.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-27 Thread Matthew Wakeling

On Wed, 27 Jan 2010, Віталій Тимчишин wrote:

How  about SELECT SUM (case when id  120 and id  121 then 1 end)
from tbl_tracker;


That is very interesting.


* All the functions should be noop for null input


Alas, not true for COUNT(*), AVG(), etc.

Matthew

--
An optimist sees the glass as half full, a pessimist as half empty,
and an engineer as having redundant storage capacity.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling

On Mon, 25 Jan 2010, Viji V Nair wrote:

I think this wont help that much if you have a single machine. Partition the
table and keep the data in different nodes. Have a look at the tools like
pgpool.II


So partitioning. You have three choices:

1. Use a single table
2. Partition the table on the same server
3. Partition the data across multiple servers.

This is in increasing order of complexity.

There will probably be no problem at all with option 1. The only problem 
arises if you run a query that performs a full sequential scan of the 
entire table, which would obviously take a while. If your queries are 
indexable, then option 1 is almost certainly the best option.


Option 2 adds complexity in the Postgres server. You will need to 
partition your tables in a logical manner - that is, there needs to be 
some difference between rows in table a compared to rows in table b. This 
means that the partitioning will in effect be a little like indexing. You 
do not want to have too many partitions. The advantage is that if a query 
requires a full sequential scan, then there is the possibility of skipping 
some of the partitions, although there is some complexity involved in 
getting this to work correctly. In a lot of cases, partitioning will make 
queries slower by confusing the planner.


Option 3 is only useful when you have a real performance problem with 
long-running queries (partitioning the data across servers) or with very 
large numbers of queries (duplicating the data across servers). It also 
adds much complexity. It is fairly simple to run a filter these results 
from the table queries across multiple servers, but if that was all you 
were doing, you may as well use an index instead. It becomes impossible to 
perform proper cross-referencing queries without some very clever software 
(because not all the data is available on the server), which will probably 
be hard to manage and slow down the execution anyway.


My recommendation would be to stick with a single table unless you have a 
real need to partition.


Matthew

--
Note: some countries impose serious penalties for a conspiracy to overthrow
 the political system. THIS DOES NOT FIX THE VULNERABILITY.
  -- http://seclists.org/vulnwatch/2003/q2/0002.html

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling

On Mon, 25 Jan 2010, nair rajiv wrote:
I am working on a project that will take out structured content from 
wikipedia and put it in our database...

there is a table which will approximately have 5 crore entries after data
harvesting.


Have you asked the Wikimedia Foundation if they mind you consuming that 
much of their bandwidth, or even if there are copyright issues involved in 
grabbing that much of their data?


(The other problem with using the word crore is that although it may 
mean 1000 in a few countries, it could also mean 50.)


Matthew

--
Of course it's your fault. Everything here's your fault - it says so in your
contract.- Quark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Matthew Wakeling

On Tue, 26 Jan 2010, Richard Neill wrote:
SELECT SUM (case when id  120 and id  121 then 1 else 0 end) from 
tbl_tracker;


Explain shows that this does a sequential scan.


I'd defer to Tom on this one, but really, for Postgres to work this out, 
it would have to peer deep into the mysterious SUM function, and realise 
that the number zero is a noop. I suppose it would be possible, but you'd 
have to define noops for each of the different possible functions, *and* 
make the planner clever enough to spot the noop-matching number in the 
else and convert the WHEN into a WHERE.


In my mind, this is quite a lot of work for the planner to do to solve 
this one. That translates into quite a lot of work for some poor 
programmer to do to achieve it. If you have the money, then hire someone 
to do it!


Matthew

--
I don't want the truth. I want something I can tell parliament!
 -- Rt. Hon. Jim Hacker MP

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Matthew Wakeling

On Mon, 25 Jan 2010, Richard Huxton wrote:
OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per 
row)


The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).


Agreed. One query is faster than the other because it has to do an eighth 
the amount of work.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sql result b where condition

2010-01-25 Thread Matthew Wakeling

On Mon, 25 Jan 2010, A. Kretschmer wrote:

In response to ramasubramanian :

Please, create a new mail for a new topic and don't hijack other
threads.


Even more so - this isn't probably the right mailing list for generic sql 
help questions.



select ENAME,ORIG_SALARY from employee where (ename='Tom' and
orig_salary=2413)or(orig_salary=1234 )

if the fist condition(ename='Tom' and orig_salary=2413) is satified then 10
rows will be returned, for the second condition (orig_salary=1234 ) there
are 20 rows will be returned.
The order of display should be

The first 10 rows then
next 20 rows.



select ENAME,ORIG_SALARY, 1 as my_order from employee where (ename='Tom' and
orig_salary=2413) union all select ENAME,ORIG_SALARY, 2 employee where
(orig_salary=1234 ) order by my_order.


Or just:

select ENAME,ORIG_SALARY from employee where (ename='Tom' and
orig_salary=2413)or(orig_salary=1234 ) ORDER BY orig_salary DESC

as there is going to be only two values for orig_salary.

Matthew

--
The early bird gets the worm. If you want something else for breakfast, get
up later.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Matthew Wakeling

On Fri, 22 Jan 2010, Tory M Blue wrote:

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.


Okay, so you have two differing cases. Show us the EXPLAIN ANALYSE for 
both of them, and we will see what the difference is.


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-21 Thread Matthew Wakeling

On Wed, 20 Jan 2010, Greg Smith wrote:
Basically, to an extent, that's right. However, when you get 16 drives or 
more into a system, then it starts being an issue.


I guess if I test a system with *only* 16 drives in it one day, maybe I'll 
find out.


*Curious* What sorts of systems have you tried so far?

As the graph I just sent shows, the four schedulers are pretty-much 
identical in performance, until you start saturating it with simultaneous 
requests. CFQ levels out at a performance a little lower than the other 
three.


Seriously though, there is some difference between a completely synthetic 
test like you noted issues with here, and anything you can see when running 
the database.


Granted, this test is rather synthetic. It is testing the rather unusual 
case of lots of simultaneous random small requests - more simultaneous 
requests than we advise people to run backends on a server. You'd probably 
need to get a RAID array a whole lot bigger than 16 drives to have a 
normal workload capable of demonstrating the performance difference, and 
even that isn't particularly major.


Would be interesting research if anyone has a 200-spindle RAID array 
hanging around somewhere.


Matthew

--
A good programmer is one who looks both ways before crossing a one-way street.
Considering the quality and quantity of one-way streets in Cambridge, it
should be no surprise that there are so many good programmers there.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-21 Thread Matthew Wakeling

On Thu, 21 Jan 2010, Greg Smith wrote:

In the attachement you'll find 2 screenshots perfmon34.png
and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
list).


I don't think they made it to the list?


No, it seems that no emails with image attachments ever make it through 
the list server. Someone mentioned something about banning the guy who set 
the list up from the internet or something. 
http://archives.postgresql.org/pgsql-performance/2008-01/msg00290.php


Matthew

--
Bashir: The point is, if you lie all the time, nobody will believe you, even
when you're telling the truth. (RE: The boy who cried wolf)
Garak: Are you sure that's the point, Doctor?
Bashir: What else could it be?-- Star Trek DS9
Garak: That you should never tell the same lie twice. -- Improbable Cause

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-20 Thread Matthew Wakeling

On Fri, 15 Jan 2010, Greg Smith wrote:
It seems to me that CFQ is simply bandwidth limited by the extra processing 
it has to perform.


I'm curious what you are doing when you see this.


16 disc 15kRPM RAID0, when using fadvise with more than 100 simultaneous 
8kB random requests. I sent an email to the mailing list on 29 Jan 2008, 
but it got thrown away by the mailing list spam filter because it had an 
image in it (the graph showing interesting information). Gregory Stark 
replied to it in 
http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php


I was using his synthetic test case program.

My theory has been that the extra processing it has to perform you describe 
just doesn't matter in the context of a fast system where physical I/O is 
always the bottleneck.


Basically, to an extent, that's right. However, when you get 16 drives or 
more into a system, then it starts being an issue.


Matthew

--
For every complex problem, there is a solution that is simple, neat, and wrong.
 -- H. L. Mencken 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, Greg Smith wrote:

Andy Colson wrote:
So if there is very little io, or if there is way way too much, then the 
scheduler really doesn't matter.  So there is a slim middle ground where 
the io is within a small percent of the HD capacity where the scheduler 
might make a difference?


That's basically how I see it.  There seem to be people who run into 
workloads in the middle ground where the scheduler makes a world of 
difference.  I've never seen one myself, and suspect that some of the reports 
of deadline being a big improvement just relate to some buginess in the 
default CFQ implementation that I just haven't encountered.


That's the perception I get. CFQ is the default scheduler, but in most 
systems I have seen, it performs worse than the other three schedulers, 
all of which seem to have identical performance. I would avoid 
anticipatory on a RAID array though.


It seems to me that CFQ is simply bandwidth limited by the extra 
processing it has to perform.


Matthew

--
Experience is what allows you to recognise a mistake the second time you
make it.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

The data needs to be written first to the WAL, in order to provide
crash-safety. So you're actually writing 1600MB, not 800.


I understand. So the actual throughput is 32MB/s which is
closer to 43 MB/s, of course.

Can I verify that by temporarily disabling WAL writes
completely and see if the thoughput is then doubled?


There isn't a magic setting in Postgres to disable the WAL. That would be 
far too tempting, and an easy way to break the database.


However, what you can do is to insert the data into the table in the same 
transaction as creating the table. Then Postgres knows that no other 
transactions can see the table, so it doesn't need to be so careful.


Unfortunately, I don't think even this strategy will work in your case, as 
you will be writing to the large object table, which already exists. Could 
someone who knows confirm this?


Matthew

--
Let's say I go into a field and I hear baa baa baa. Now, how do I work 
out whether that was baa followed by baa baa, or if it was baa baa

followed by baa?
- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] new server I/O setup

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, Scott Marlowe wrote:

I've just received this new server:
1 x XEON 5520 Quad Core w/ HT
8 GB RAM 1066 MHz
16 x SATA II Seagate Barracuda 7200.12
3ware 9650SE w/ 256MB BBU

2 discs in RAID 1 for OS + pg_xlog partitioned with ext2.
12 discs in RAID 10 for postgres data, sole partition with ext3.
2 spares


I think your first choice is right.  I use the same basic setup with
147G 15k5 SAS seagate drives and the pg_xlog / OS partition is almost
never close to the same level of utilization, according to iostat, as
the main 12 disk RAID-10 array is.  We may have to buy a 16 disk array
to keep up with load, and it would be all main data storage, and our
pg_xlog main drive pair would be just fine.


The benefits of splitting off a couple of discs for WAL are dubious given 
the BBU cache, given that the cache will convert the frequent fsyncs to 
sequential writes anyway. My advice would be to test the difference. If 
the bottleneck is random writes on the 12-disc array, then it may actually 
help more to improve that to a 14-disc array instead.


I'd also question whether you need two hot spares, with RAID-10. Obviously 
that's a judgement call only you can make, but you could consider whether 
it is sufficient to just have a spare disc sitting on a shelf next to the 
server rather than using up a slot in the server. Depends on how quickly 
you can get to the server on failure, and how important the data is.


Matthew

--
In the beginning was the word, and the word was unsigned,
and the main() {} was without form and void...
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

Nevertheless: If your explanation covers all what can be
said about it then replacing the hard disk by a faster one
should increase the performance here (I'll try to check that
out).


Probably. However, it is worth you running the test again, and looking at 
how busy the CPU on the machine is. The disc may be the bottleneck, or the 
CPU may be the bottleneck.


Matthew

--
Take care that thou useth the proper method when thou taketh the measure of
high-voltage circuits so that thou doth not incinerate both thee and the
meter; for verily, though thou has no account number and can be easily
replaced, the meter doth have one, and as a consequence, bringeth much woe
upon the Supply Department.   -- The Ten Commandments of Electronics

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] new server I/O setup

2010-01-15 Thread Matthew Wakeling

On Fri, 15 Jan 2010, Fernando Hevia wrote:

I was wondering if disabling the bbu cache on the RAID 1 array would make
any difference. All 256MB would be available for the random I/O on the RAID
10.


That would be pretty disastrous, to be honest. The benefit of the cache is 
not only that it smooths random access, but it also accelerates fsync. The 
whole point of the WAL disc is for it to be able to accept lots of fsyncs 
very quickly, and it can't do that without its BBU cache.


Matthew

--
Heat is work, and work's a curse. All the heat in the universe, it's
going to cool down, because it can't increase, then there'll be no
more work, and there'll be perfect peace.  -- Michael Flanders

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an unused table kills my server

2010-01-15 Thread Matthew Wakeling

On Fri, 15 Jan 2010, Craig James wrote:
That's the perception I get. CFQ is the default scheduler, but in most 
systems I have seen, it performs worse than the other three schedulers, all 
of which seem to have identical performance. I would avoid anticipatory on 
a RAID array though.


I thought the best strategy for a good RAID controller was NOOP.


Agreed. That's what we use here. My observation is though that noop is 
identical in performance to anticipatory and deadline. Theoretically, it 
should be faster.


Matthew

--
Take care that thou useth the proper method when thou taketh the measure of
high-voltage circuits so that thou doth not incinerate both thee and the
meter; for verily, though thou has no account number and can be easily
replaced, the meter doth have one, and as a consequence, bringeth much woe
upon the Supply Department.   -- The Ten Commandments of Electronics

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

This takes about 50s, so, 800MB/50s = 16MB/s.

However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?


Several reasons:

The data needs to be written first to the WAL, in order to provide 
crash-safety. So you're actually writing 1600MB, not 800.


Postgres needs to update a few other things on disc (indexes on the large 
object table maybe?), and needs to call fsync a couple of times. That'll 
add a bit of time.


Your discs can't write 43MB/s in the *worst case* - the worst case is lots 
of little writes scattered over the disc, where it would be lucky to 
manage 1MB/s. Not all of the writes Postgres makes are sequential. A handy 
way of knowing how sequential the writes are is to listen to the disc as 
it writes - the clicking sounds are where it has to waste time moving the 
disc head from one part of the disc to another.


Matthew

--
No trees were killed in the sending of this message.  However a large
number of electrons were terribly inconvenienced.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Select count(*) ... query on table with 60 Mio. rows

2010-01-14 Thread Matthew Wakeling

On Thu, 14 Jan 2010, tom wrote:

i have a db-table data_measurand with about 6000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
 count
--
60846187
(1 row)


Sounds pretty reasonable to me. Looking at your table, the rows are maybe 
200 bytes wide? That's 12GB of data for Postgres to munch through. 30 
seconds is really rather quick for that (400MB/s). What sort of RAID array 
is managing to give you that much?



I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3


Ditch lvm.


This is an FAQ. Counting the rows in a table is an expensive operation in 
Postgres. It can't be answered directly from an index. If you want, you 
can keep track of the number of rows yourself with triggers, but beware 
that this will slow down write access to the table.


Matthew

--
Nog: Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog: Frightening, isn't it?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-12 Thread Matthew Wakeling

On Mon, 11 Jan 2010, Bob Dusek wrote:

How do I learn more about the actual lock contention in my db?   Lock 
contention makes
some sense.  Each of the 256 requests are relatively similar.  So, I don't 
doubt that
lock contention could be an issue.  I just don't know how to observe it or 
correct it. 
It seems like if we have processes that are contending for locks, there's not 
much we can
do about it. 


To me:

1. This doesn't look like an IO bandwidth issue, as the database is small.
2. This doesn't look like a CPU speed issue, as usage is low.
3. This doesn't look like a memory bandwidth issue, as that would count as
   CPU active in top.
4. This doesn't look like a memory size problem either.

So, what's left? It could be a network bandwidth problem, if your client 
is on a separate server. You haven't really given much detail about the 
nature of the queries, so it is difficult for us to tell if the size of 
the results means that you are maxing out your network. However, it 
doesn't sound like it is likely to me that this is the problem.


It could be a client bottleneck problem - maybe your server is performing 
really well, but your client can't keep up. You may be able to determine 
this by switching on logging of long-running queries in Postgres, and 
comparing that with what your client says. Also, look at the resource 
usage on the client machine.


It could be a lock contention problem. To me, this feels like the most 
likely. You say that the queries are similar. If you are reading and 
writing from a small set of the same objects in each of the transactions, 
then you will suffer badly from lock contention, as only one backend can 
be in a read-modify-write cycle on a given object at a time. We don't know 
enough about the data and queries to say whether this is the case. 
However, if you have a common object that every request touches (like a 
status line or something), then try to engineer that out of the system.


Hope this helps. Synchronising forty processes around accessing a single 
object for high performance is really hard, and Postgres does it 
incredibly well, but it is still by far the best option to avoid 
contention completely if possible.



  -Kevin


It'd really help us reading your emails if you could make sure that it is 
easy to distinguish your words from words you are quoting. It can be very 
confusing reading some of your emails, trying to remember which bits I 
have seen before written by someone else. This is one of the few lines 
that I know you didn't write - you're a Bob, not a Kevin. A few  
characters at the beginning of lines, which most mail readers will add 
automatically, make all the difference.


Matthew

--
Me... a skeptic?  I trust you have proof?
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance config help

2010-01-12 Thread Matthew Wakeling

On Tue, 12 Jan 2010, Bob Dusek wrote:

Each of the concurrent clients does a series of selects, inserts, updates,
and deletes.  The requests would generally never update or delete the same
rows in a table.  However, the requests do generally write to the same
tables.  And, they are all reading from the same tables that they're writing
to.  For the inserts, I imagine they are blocking on access to the sequence
that controls the primary keys for the insert tables.


I'm going to have to bow out at this stage, and let someone else who knows 
more about what gets locked in a transaction help instead. The sequence 
may be significant, but I would have thought it would have to be something 
a bit bigger that is gumming up the works.



I'm really sorry.  I'm using gmail's interface.


Actually, you weren't doing anything particularly wrong as it turns out. 
It is partly a case of alpine being too clever for its own good, just as 
Kevin pointed out. My mail reader is taking the most preferred mime 
alternative, which is the HTML version, and interpreting it to its best 
ability, which isn't very well. It is the email that says which 
alternative is preferred, by the way. I have just forced alpine to view 
the plain text version instead, and it is much better.


I just saw the  Plain Text formatter at the top of this compose 
message.  But, if I convert it to Plain Text now, I may lose my portion 
of the message.  I'll use the Plain Text when posting future messages.


To be honest, that's always a good idea, although you didn't actually do 
wrong. I do know people whose spam filters immediately discard emails that 
contain a HTML alternative - that's taking it to the extreme!


Matthew

--
Beware of bugs in the above code; I have only proved it correct, not
tried it.   --Donald Knuth

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choice of bitmap scan over index scan

2010-01-11 Thread Matthew Wakeling

On Mon, 11 Jan 2010, Mathieu De Zutter wrote:

 seq_page_cost = 0.1
 random_page_cost = 0.1



So if this query usually does *not* hit the cache, it will be probably faster 
if I leave
it like that? While testing a query I execute it that much that it's always 
getting into
the cache. However, since other applications run on the same server, I think 
that
infrequently used data gets flushed after a while, even if the DB could fit in 
the RAM.


Postgres is being conservative. The plan it uses (bitmap index scan) will 
perform much better than an index scan when the data is not in the cache, 
by maybe an order of magnitude, depending on your hardware setup.


The index scan may perform better at the moment, but the bitmap index scan 
is safer.


Matthew

--
Change is inevitable, except from vending machines.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Digesting explain analyze

2010-01-07 Thread Matthew Wakeling

On Thu, 7 Jan 2010, Jesper Krogh wrote:

If disk seeks are killing you a kinda crazy idea would be to
duplicate the table - clustering one by (id1) and
the other one by an index on (id2) and unioning the
results of each.


That's doubling the disk space needs for the table. Is there any odds
that this would benefit when the intitial table significantly exceeds
available memory by itself?


If the table already greatly exceeds the available RAM, then doubling the 
amount of data won't make a massive difference to performance. You're 
going to disc for everything anyway.



Since each of these duplicates of the table will be clustered
by the column you're querying it on, it should just take one
seek in each table.

Then your query could be something like

  select * from (
select * from t1 where id1=2067 order by evalue limit 100
union
select * from t2 where id2=2067 order by evalue limit 100
  ) as foo order by evalue limit 100;


This is actually what I ended up with as the best performing query, just
still on a single table, because without duplication I can add index and
optimize this one by (id1,evalue) and (id2,evalue). It is still getting
killed quite a lot by disk IO. So I guess I'm up to:


You're kind of missing the point. The crucial step in the above suggestion 
is to cluster the table on the index. This will mean that all the rows 
that are fetched together are located together on disc, and you will no 
longer be killed by disc IO.



1) By better disk (I need to get an estimate how large it actually is
going to get).


Unless you cluster, you are still going to be limited by the rate at which 
the discs can seek. Postgres 8.4 has some improvements here for bitmap 
index scans if you have a RAID array, and set the effective_concurrency 
setting correctly.



2) Stick with one table, but make sure to have enough activity to get a
large part of the index in the OS-cache anyway. (and add more memory if
nessesary).


In order to win here, you will need to make memory at least as big as the 
commonly-accessed parts of the database. This could get expensive.



I didnt cluster it, since clustering locks everything.


You can also test out the hypothesis by copying the table instead:

CREATE NEW TABLE test1 AS SELECT * FROM table1 ORDER BY id1;

Then create an index on id1, and test against that table. The copy will 
become out of date quickly, but it will allow you to see whether the 
performance benefit is worth it. It will also tell you how long a cluster 
will actually take, without actually locking anything.


Matthew

--
In the beginning was the word, and the word was unsigned,
and the main() {} was without form and void...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Air-traffic benchmark

2010-01-07 Thread Matthew Wakeling

On Thu, 7 Jan 2010, Gurgel, Flavio wrote:
If one single query execution had a step that brought a page to the 
buffercache, it's enough to increase another step speed and change the 
execution plan, since the data access in memory is (usually) faster then 
disk.


Postgres does not change a query plan according to the shared_buffers 
setting. It does not anticipate one step contributing to another step in 
this way. It does however make use of the effective_cache_size setting to 
estimate this effect, and that does affect the planner.


The use of the index over seqscan has to be tested. I don't agree in 50% 
gain, since simple integers stored on B-Tree have a huge possibility of 
beeing retrieved in the required order, and the discarded data will be 
discarder quickly too, so the gain has to be measured.


I bet that an index scan will be a lot faster, but it's just a bet :)


In a situation like this, the opposite will be true. If you were accessing 
a very small part of a table, say to order by a field with a small limit, 
then an index can be very useful by providing the results in the correct 
order. However, in this case, almost the entire table has to be read. 
Changing the order in which it is read will mean that the disc access is 
no longer sequential, which will slow things down, not speed them up. 
The Postgres planner isn't stupid (mostly), there is probably a good 
reason why it isn't using an index scan.



The table is very wide, which is probably why the tested databases can
deal with it faster than PG. You could try and narrow the table down
(for instance: remove the Div* fields) to make the data more
relational-like. In real life, speedups in this circumstances would
probably be gained by normalizing the data to make the basic table
smaller and easier to use with indexing.


Ugh. I don't think so. That's why indexes were invented. PostgreSQL is 
smart enough to jump over columns using byte offsets.

A better option for this table is to partition it in year (or year/month) 
chunks.


Postgres (mostly) stores the columns for a row together with a row, so 
what you say is completely wrong. Postgres does not jump over columns 
using byte offsets in this way. The index references a row in a page on 
disc, and that page is fetched separately in order to retrieve the row. 
The expensive part is physically moving the disc head to the right part of 
the disc in order to fetch the correct page from the disc - jumping over 
columns will not help with that at all.


Reducing the width of the table will greatly improve the performance of a 
sequential scan, as it will reduce the size of the table on disc, and 
therefore the time taken to read the entire table sequentially.


Moreover, your suggestion of partitioning the table may not help much with 
this query. It will turn a single sequential scan into a UNION of many 
tables, which may be harder for the planner to plan. Also, for queries 
that access small parts of the table, indexes will help more than 
partitioning will.


Partitioning will help most in the case where you want to summarise a 
single year's data. Not really otherwise.


Matthew

--
Q: What's the difference between ignorance and apathy?
A: I don't know, and I don't care.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_connect takes 3.0 seconds

2010-01-06 Thread Matthew Wakeling

On Wed, 6 Jan 2010, Dmitri Girski wrote:

On the other hand, if I use ip addresses this should not attract any possible 
issues with
DNS, right?


Not true. It is likely that the server program you are connecting to will 
perform a reverse DNS lookup to work out who the client is, for logging or 
authentication purposes.


Matthew

--
To err is human; to really louse things up requires root
privileges. -- Alexander Pope, slightly paraphrased

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Idea how to get rid of Bitmap Heap Scan

2009-12-18 Thread Matthew Wakeling

On Fri, 18 Dec 2009, Michael N. Mikhulya wrote:

The problem here is that we are forced to fetch files in Bitmap Heap Scan.
But actually there is no need for the whole files record. The
necessary data is only files ids.

The idea is to avoid fetching data from files table, and get the ids
from index! (probably it is a little bit tricky, but it is a
performance area...)


Unfortunately, the index does not contain enough information to accomplish 
this. This is due to Postgres' advanced concurrency control system. 
Postgres needs to fetch the actual rows from the files table in order to 
check whether that row is visible in the current transaction, and a Bitmap 
Index Scan is the fastest way to do this.


You can speed this up in Postgres 8.4 by having a RAID array and setting 
the effective_concurrency configuration to the number of spindles in the 
RAID array, or by having gobs of RAM and keeping everything in cache.


Matthew

--
A good programmer is one who looks both ways before crossing a one-way street.
Considering the quality and quantity of one-way streets in Cambridge, it
should be no surprise that there are so many good programmers there.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fw: Help me put 2 Gigs of RAM to use

2009-12-10 Thread Matthew Wakeling

On Thu, 10 Dec 2009, Mark Stosberg wrote:

What I'm noticing is that the while the FreeBSD server has 4 Gigs of
memory, there are rarely every more than 2 in use-- the memory use
graphs as being rather constant. My goal is to make good use of those 2
Gigs of memory to improve performance and reduce the CPU usage.


I think you'll find that the RAM is already being used quite effectively 
as disc cache by the OS. It sounds like the server is actually set up 
pretty well. You may get slightly better performance by tweaking a thing 
here or there, but the server needs some OS disc cache to perform well.



(We currently run 8.2, but are planning an upgrade to 8.4 soon).


Highly recommended.


[I tried to post this yesterday but didn't see it come through. This
message is a second attempt.)


The mailing list server will silently chuck any message whose subject 
starts with the word help, just in case you're asking for help about 
managing the mailing list. The default behaviour is not to inform you that 
it has done so. It is highly annoying - could a list admin please consider 
changing this?


Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.-- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Matthew Wakeling

On Tue, 8 Dec 2009, niraj patel wrote:

 Group  (cost=509989.19..511518.30 rows=9 width=10) (actual 
time=1783.102..2362.587
rows=261 loops=1)
   -  Sort  (cost=509989.19..510753.74 rows=305821 width=10) (actual
time=1783.097..2121.378 rows=272211 loops=1)
 Sort Key: topfamilyid
 -  Bitmap Heap Scan on cmrules r  (cost=14501.36..476896.34 
rows=305821
width=10) (actual time=51.507..351.487 rows=272211 loops=1)
   Recheck Cond: (workspaceid = 18512::numeric)
   -  Bitmap Index Scan on pk_ws_fea_fam_cmrules  
(cost=0.00..14424.90
rows=305821 width=0) (actual time=48.097..48.097 rows=272211 loops=1)
 Index Cond: (workspaceid = 18512::numeric)
 Total runtime: 2373.008 ms
(8 rows)



select count(*) from  cmrules;

Gives me 17 643 532 Rows


Looks good from here. Think about what you're asking the database to do. 
It has to select 272211 rows out of a large table with 17643532 rows. That 
in itself could take a very long time. It is clear that in your EXPLAIN 
this data is already cached, otherwise it would have to perform nigh on 
27 seeks over the discs, which would take (depending on the disc 
system) something on the order of twenty minutes. Those 272211 rows then 
have to be sorted, which takes a couple of seconds, which again is pretty 
good. The rows are then uniqued, which is really quick, before returning 
the results.


It's hard to think how you would expect the database to do this any 
faster, really.



Indexes:
    pk_ws_fea_fam_cmrules PRIMARY KEY, btree (workspaceid, featureid, 
topfamilyid,
ruleenddate, gid)
    idx_cmrules btree (topfamilyid)
    idx_gid_ws_cmrules btree (gid, workspaceid)


You may perhaps benefit from an index on just the workspaceid column, but 
the benefit may be minor.


You may think of clustering the table on the index, but that will only be 
of benefit if the data is not in the cache.


The statistics seem to be pretty accurate, predicting 305821 instead of 
272211 rows. The database is not going to easily predict the number of 
unique results (9 instead of 261), but that doesn't affect the query plan 
much, so I wouldn't worry about it.


I would consider upgrading to Postgres 8.4 if possible, as it does have 
some considerable performance improvements, especially for bitmap index 
scans if you are using a RAID array. I'd also try using SELECT DISTINCT 
rather than GROUP BY and seeing if that helps.


Matthew

--
Now the reason people powdered their faces back then was to change the values
s and n in this equation here. - Computer science lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing Bitmap Heap Scan.

2009-12-08 Thread Matthew Wakeling

On Tue, 8 Dec 2009, niraj patel wrote:
Thanks very much for the analysis. It does takes 17 sec to execute when 
data is not in cache.


It sounds like the table is already very much ordered by the workspaceid, 
otherwise this would have taken much longer.


What I would like to ask can partitioning around workspaceid would help? 
Or any sort of selective index would help me.


Depends on how many distinct values of workspaceid there are. I would 
suggest that given how well ordered your table is, and if you aren't doing 
too many writes, then there would be little benefit, and much hassle.


Matthew

--
Now, you would have thought these coefficients would be integers, given that
we're working out integer results. Using a fraction would seem really
stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
use complex numbers.-- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Matthew Wakeling

On Tue, 1 Dec 2009, Jean-Michel Pouré wrote:

PostgreSQL query analyzer needs to run a couple of times before it can
rewrite and optimize the query. Make sure demand_id, id and join IDs
carry indexes.


Huh? At what point does the planner carry over previous plans and use them 
to further optimise the query?


But perhaps the biggest factor here is calling a five table join a pretty 
simple query.


Matthew

--
Prolog doesn't have enough parentheses. -- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Matthew Wakeling

On Wed, 25 Nov 2009, Grzegorz Jaśkiewicz wrote:

the out of order data layout is primary reason for index bloat. And that 
happens , and
gets worse over time once data is more and more distributed. (random deletes, 
etc).


That's not index bloat. Sure, having the table not in the same order as 
the index will slow down an index scan, but that's a completely different 
problem altogether.


Index bloat is caused by exactly the same mechanism as table bloat. The 
index needs to have an entry for every row in the table that may be 
visible by anyone. As with the table, it is not possible to 
deterministically delete the rows as they become non-visible, so the 
index (and the table) will be left with dead entries on delete and update. 
The vacuum command performs garbage collection and marks these dead rows 
and index entries as free, so that some time in the future more data can 
be written to those places.


Index bloat is when there is an excessive amount of dead space in an 
index. It can be prevented by (auto)vacuuming regularly, but can only be 
reversed by REINDEX (or of course deleting the index, or adding loads of 
new entries to fill up the dead space after vacuuming).


Matthew

--
for a in past present future; do
  for b in clients employers associates relatives neighbours pets; do
  echo The opinions here in no way reflect the opinions of my $a $b.
done; done
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance degradation

2009-11-25 Thread Matthew Wakeling

On Tue, 24 Nov 2009, Denis Lussier wrote:

Bouncing the app will roll back the transactions.


Depends on the application. Some certainly use a shutdown hook to flush 
data out to a database cleanly.


Obviously if you kill -9 it, then all bets are off.

Matthew

--
Software suppliers are trying to make their software packages more
'user-friendly' Their best approach, so far, has been to take all
the old brochures, and stamp the words, 'user-friendly' on the cover.
-- Bill Gates

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Matthew Wakeling

On Wed, 25 Nov 2009, Richard Neill wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:

Worse still, doing a cluster of most of the tables and vacuum full analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I
didn't realise that vacuum full was ever actively bad, only sometimes
unneeded. I do now - thanks for the tip.


The problem is that vacuum full does a full compact of the table, but it 
has to update all the indexes as it goes. This makes it slow, and causes 
bloat to the indexes. There has been some discussion of removing the 
command or at least putting a big warning next to it.



So, having managed to bloat the indexes in this way, what can I do to
fix it? Will a regular vacuum do the job?


In fact, cluster is exactly the command you are looking for. It will drop 
the indexes, do a complete table rewrite (in the correct order), and then 
recreate all the indexes again.


In normal operation, a regular vacuum will keep the table under control, 
but if you actually want to shrink the database files in exceptional 
circumstances, then cluster is the tool for the job.


Matthew

--
Matthew: That's one of things about Cambridge - all the roads keep changing
  names as you walk along them, like Hills Road in particular.
Sagar:   Yes, Sidney Street is a bit like that too.
Matthew: Sidney Street *is* Hills Road.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [GENERAL] Strange performance degradation

2009-11-24 Thread Matthew Wakeling

On Mon, 23 Nov 2009, Lorenzo Allegrucci wrote:

Anyway, how can I get rid those idle in transaction processes?
Can I just kill -15 them or is there a less drastic way to do it?


Are you crazy? Sure, if you want to destroy all of the changes made to the 
database in that transaction and thoroughly confuse the client 
application, you can send a TERM signal to a backend, but the consequences 
to your data are on your own head.


Fix the application, don't tell Postgres to stop being a decent database.

Matthew

--
I would like to think that in this day and age people would know better than
to open executables in an e-mail. I'd also like to be able to flap my arms
and fly to the moon.-- Tim Mullen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance degradation

2009-11-24 Thread Matthew Wakeling

On Tue, 24 Nov 2009, Denis Lussier wrote:

IMHO the client application is already confused and it's in Prod.
Shouldn't he perhaps terminate/abort the IDLE connections in Prod and
work on correcting the problem so it doesn't occur in Dev/Test??


The problem is, the connection isn't just IDLE - it is idle IN 
TRANSACTION. This means that there is quite possibly some data that has 
been modified in that transaction. If you kill the backend, then that will 
automatically roll back the transaction, and all of those changes would be 
lost.


I agree that correcting the problem in dev/test is the priority, but I 
would be very cautious about killing transactions in production. You don't 
know what data is uncommitted. The safest thing to do may be to bounce the 
application, rather than Postgres.


Matthew

--
All of this sounds mildly turgid and messy and confusing... but what the
heck. That's what programming's all about, really
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RAID card recommendation

2009-11-24 Thread Matthew Wakeling


We're about to purchase a new server to store some of our old databases, 
and I was wondering if someone could advise me on a RAID card. We want to 
make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 5 or 
6 because there will be zero write traffic. The priority is stuffing as 
much storage into a small 2U rack as possible, with performance less 
important. We will be running Debian Linux.


People have mentioned Areca as making good RAID controllers. We're looking 
at the Areca ARC-1220 PCI-Express x8 SATA II as a possibility. Does 
anyone have an opinion on whether it is a turkey or a star?


Another possibility is a 3-ware card of some description.

Thanks in advance,

Matthew

--
Now you see why I said that the first seven minutes of this section will have
you looking for the nearest brick wall to beat your head against. This is
why I do it at the end of the lecture - so I can run.
   -- Computer Science lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is the query not using the index for sorting?

2009-11-23 Thread Matthew Wakeling

On Sun, 22 Nov 2009, Jonathan Blitz wrote:

I have a table with a number of columns.
 
I perform
 
Select *
from table
order by a,b
 
There is an index on a,b which is clustered (as well as indexes on a and b 
alone).
I have issued the cluster and anyalze commands.


Did you analyse *after* creating the index and clustering, or before?

Matthew

--
[About NP-completeness] These are the problems that make efficient use of
the Fairy Godmother.-- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-23 Thread Matthew Wakeling

On Sun, 22 Nov 2009, Richard Neill wrote:

Worse still, doing a cluster of most of the tables and vacuum full analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to 
indexes.


If you have run a cluster command, then running vacuum full will make the 
table and index layout worse, not better.


Matthew

--
Riker: Our memory pathways have become accustomed to your sensory input.
Data:  I understand - I'm fond of you too, Commander. And you too Counsellor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-20 Thread Matthew Wakeling

On Thu, 19 Nov 2009, Greg Smith wrote:
This is why turning the cache off can tank performance so badly--you're going 
to be writing a whole 128K block no matter what if it's force to disk without 
caching, even if it's just to write a 8K page to it.


Theoretically, this does not need to be the case. Now, I don't know what 
the Intel drives actually do, but remember that for flash, it is the 
*erase* cycle that has to be done in large blocks. Writing itself can be 
done in small blocks, to previously erased sites.


The technology for combining small writes into sequential writes has been 
around for 17 years or so in 
http://portal.acm.org/citation.cfm?id=146943dl= so there really isn't any 
excuse for modern flash drives not giving really fast small writes.


Matthew

--
for a in past present future; do
  for b in clients employers associates relatives neighbours pets; do
  echo The opinions here in no way reflect the opinions of my $a $b.
done; done

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance degradation

2009-11-20 Thread Matthew Wakeling

On Fri, 20 Nov 2009, Lorenzo Allegrucci wrote:

performance is degrading...



In normal conditions the postgres process uses about 3% of cpu time
but when is in degraded conditions it can use up to 25% of cpu time.


You don't really give enough information to determine what is going on 
here. This could be one of two situations:


1. You have a constant incoming stream of short-lived requests at a 
constant rate, and Postgres is taking eight times as much CPU to service 
it as normal. You're looking at CPU usage in aggregate over long periods 
of time. In this case, we should look at long running transactions and 
other slowdown possibilities.


2. You are running a complex query, and you look at top and see that 
Postgres uses eight times as much CPU as when it has been freshly started. 
In this case, the performance degradation could actually be that the 
data is more in cache, and postgres is able to process it eight times 
*faster*. Restarting Postgres kills the cache and puts you back at square 
one.


Which of these is it?

Matthew

--
Reality is that which, when you stop believing in it, doesn't go away.
 -- Philip K. Dick

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   6   >