[PERFORM] Postgres query completion status?

2009-11-19 Thread Richard Neill

Dear All,

I've just joined this list, so let me first thank you in advance for 
your hospitality.


I'm having lots of trouble with variously slow running queries on a 
production system. I've tried all the "obvious" fixes: changing the 
query planner, checking for indexing, autovacuum, making sure the thing 
has heaps of memory (20GB), running on solid state disks etc.



1. Is there any way to debug or trace a running query? I think I have 
all the logging options turned on, but I'd like to see something like:

 "Currently reading 3452 rows from table_x, at 0.2 us per row" or
whatever, being really, really verbose in the logfiles.

Likewise, is there any way to check whether, for example, postgres is 
running out of work memory?




2. Is there any way, whatsoever, to get any kind of "progress bar" for a 
running query? I know these things have terrible faults, but anything 
monotonic would be better than running completely blind.


[There's a wonderful paper here:
http://pages.cs.wisc.edu/~naughton/includes/papers/multiquery.pdf
which seems to have got 90% of the way there, but appears to have been 
abandoned as it didn't get all 100% of the way]



The operations people in the warehouse are currently going crazy because 
 we can't ever answer the question "when will this query complete?". I 
know it's hard to do accurately, but knowing the difference between "5 
seconds to go" and "5 hours to go" would be so fantastically useful.


Thanks,

Richard



P.S. Sometimes, some queries seem to benefit from being cancelled and 
then immediately re-started. As there are being run in a transaction, I 
can't see how this could make a difference. Am I missing anything 
obvious?  Occasionally, a re-start of postgresql-8.4l itself seems to help.




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-19 Thread Richard Neill

Thanks for your help. This issue splits into 2 bits:

1. Fixing specific queries.

2. Finding out when a specific running query is going to complete.
(At the moment, this is the bit I really need to know).


Greg Williamson wrote:

Richard --

 You might post the results of "EXPLAIN ANALYZE ;" ... be sure to run it in a 
transaction if you want to be able roll it back. Perhaps try "EXPLAIN ;" first as it 
is faster, but EXPLAIN ANALYZE shows what the planner is doing.



The offending query (simplified to just do a select - which is the slow
bit) is:


-
SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM
core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand
USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id;



Over the last few weeks, this has gradually slowed down from 6 minutes
to about 6.5, then last night it took 25, and today it's taken an hour
already and still not completed. The system hasn't been doing anything
special in the last 2 days.



Here's EXPLAIN  (Explain analyze will take too long!)

-
 Nested Loop  (cost=885367.03..1123996.87 rows=8686 width=12)
   ->  Merge Join  (cost=885367.03..1115452.17 rows=8688 width=16)
 Merge Cond: ((core.demand.target_id =
wave_genreorders_map.target_id) AND (core.demand.material_id =
core.material.id))
 ->  Index Scan using demand_target_id_key on demand
(cost=0.00..186520.46 rows=3800715 width=24)
 ->  Sort  (cost=885364.61..893425.30 rows=3224275 width=24)
   Sort Key: wave_genreorders_map.target_id, core.material.id
   ->  Hash Join  (cost=511934.12..536811.73 rows=3224275
width=24)
 Hash Cond: (core.material.tag =
(product_info_sku.sid)::text)
 ->  Append  (cost=0.00..10723.27 rows=689377 width=28)
   ->  Seq Scan on material
(cost=0.00..5474.75 rows=397675 width=21)
   ->  Seq Scan on container material
(cost=0.00..5248.52 rows=291702 width=37)
 ->  Hash  (cost=506657.25..506657.25 rows=422149
width=42)
   ->  Hash Join  (cost=474610.85..506657.25
rows=422149 width=42)
 Hash Cond: ((wave_gol.sid)::text =
(product_info_sku.sid)::text)
 ->  Merge Left Join
(cost=463919.35..487522.78 rows=422149 width=29)
   Merge Cond:
(((wave_gol.wid)::text = (du_report_sku.wid)::text) AND
((wave_gol.storeorderid)::text = (du_report_sku.storeorderid)::text) AND
((wave_gol.genreorderid)::text = (du_report_sku.genreorderid)::text))
   Join Filter:
((wave_gol.sid)::text = (du_report_sku.sid)::text)
   ->  Merge Join
(cost=183717.70..197229.24 rows=422149 width=44)
 Merge Cond:
(((wave_genreorders_map.wid)::text = (wave_gol.wid)::text) AND
((wave_genreorders_map.storeorderid)::text =
(wave_gol.storeorderid)::text) AND
((wave_genreorders_map.genreorderid)::text = (wave_gol.genreorderid)::text))
 ->  Index Scan using
"wave_genreorders_map_ERR_GENREORDERID_EXISTS" on wave_genreorders_map
(cost=0.00..4015.36 rows=116099 width=27)
 ->  Sort
(cost=183717.70..184818.90 rows=440483 width=47)
   Sort Key:
wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid
   ->  Nested Loop
(cost=9769.36..142425.22 rows=440483 width=47)
 ->  Index Scan
using "wave_rxw_ERR_WID_EXISTS" on wave_rxw  (cost=0.00..7.08 rows=1
width=11)
   Filter:
is_previous
 ->  Bitmap
Heap Scan on wave_gol  (cost=9769.36..136912.11 rows=440483 width=36)
   Recheck
Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text)
   ->
Bitmap Index Scan on "wave_gol_ERR_SID_EXISTS"  (cost=0.00..9659.24
rows=440483 width=0)

Index Cond: ((wave_gol.wid)::text = (wave_rxw.wid)::text)
   ->  Sort
(cost=280201.66..281923.16 rows=688602 width=300)
 Sort Key:
du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid
 ->  HashAggregate
(cost=197936.75..206544.27 rows=688602 width=36)
  

Re: [PERFORM] Postgres query completion status?

2009-11-19 Thread Richard Neill



Greg Williamson wrote:

Richard --

 You might post the results of "EXPLAIN ANALYZE ;" ... 
be sure to run it in a transaction if you want to be able roll it 
back. Perhaps try "EXPLAIN ;" first as it is faster, but 
EXPLAIN ANALYZE shows what the planner is doing.






Is there any way I can gather some information by tracing the query 
that's currently actually running?


strace doesn't help much, but is there anything else I can do?

As far as I know, the only tools that exist are
  pg_stat_activity, top, and iotop
Have I missed one?

Thanks,

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-19 Thread Richard Neill


Greg Williamson wrote:

Richard --

 You might post the results of "EXPLAIN ANALYZE ;" ... 
be sure to run it in a transaction if you want to be able roll it 
back. Perhaps try "EXPLAIN ;" first as it is faster, but 
EXPLAIN ANALYZE shows what the planner is doing.



Here's something very very odd.
Explain Analyze has now run, in about 4 minutes.  (result below)

However, I'd be willing to swear that the last time I ran explain on 
this query about half an hour ago, the final 2 lines were sequential scans.


So, I've just terminated the real job (which uses this select for an 
update) after 77 minutes of fruitless cpu-hogging, and re-started it


...This time, the same job ran through in 24 minutes.
[This is running exactly the same transaction on exactly the same data!]


Richard





-
 Nested Loop  (cost=885367.03..1123996.87 rows=8686 width=12) (actual 
time=248577.879..253168.466 rows=347308 loops=1)
   ->  Merge Join  (cost=885367.03..1115452.17 rows=8688 width=16) 
(actual time=248577.834..252092.536 rows=347308 loops=1)
 Merge Cond: ((core.demand.target_id = 
wave_genreorders_map.target_id) AND (core.demand.material_id = 
core.material.id))
 ->  Index Scan using demand_target_id_key on demand 
(cost=0.00..186520.46 rows=3800715 width=24) (actual 
time=0.031..2692.661 rows=3800715 loops=1)
 ->  Sort  (cost=885364.61..893425.30 rows=3224275 width=24) 
(actual time=248577.789..248659.751 rows=347308 loops=1)

   Sort Key: wave_genreorders_map.target_id, core.material.id
   Sort Method:  quicksort  Memory: 39422kB
   ->  Hash Join  (cost=511934.12..536811.73 rows=3224275 
width=24) (actual time=247444.988..248263.151 rows=347308 loops=1)
 Hash Cond: (core.material.tag = 
(product_info_sku.sid)::text)
 ->  Append  (cost=0.00..10723.27 rows=689377 
width=28) (actual time=0.008..177.076 rows=690647 loops=1)
   ->  Seq Scan on material 
(cost=0.00..5474.75 rows=397675 width=21) (actual time=0.008..59.234 
rows=395551 loops=1)
   ->  Seq Scan on container material 
(cost=0.00..5248.52 rows=291702 width=37) (actual time=0.008..52.844 
rows=295096 loops=1)
 ->  Hash  (cost=506657.25..506657.25 rows=422149 
width=42) (actual time=247444.555..247444.555 rows=347308 loops=1)
   ->  Hash Join  (cost=474610.85..506657.25 
rows=422149 width=42) (actual time=182224.904..247282.711 rows=347308 
loops=1)
 Hash Cond: ((wave_gol.sid)::text = 
(product_info_sku.sid)::text)
 ->  Merge Left Join 
(cost=463919.35..487522.78 rows=422149 width=29) (actual 
time=182025.065..246638.762 rows=347308 loops=1)
   Merge Cond: 
(((wave_gol.wid)::text = (du_report_sku.wid)::text) AND 
((wave_gol.storeorderid)::text = (du_report_sku.storeorderid)::text) AND 
((wave_gol.genreorderid)::text = (du_report_sku.genreorderid)::text))
   Join Filter: 
((wave_gol.sid)::text = (du_report_sku.sid)::text)
   ->  Merge Join 
(cost=183717.70..197229.24 rows=422149 width=44) (actual 
time=859.551..1506.515 rows=347308 loops=1)
 Merge Cond: 
(((wave_genreorders_map.wid)::text = (wave_gol.wid)::text) AND 
((wave_genreorders_map.storeorderid)::text = 
(wave_gol.storeorderid)::text) AND 
((wave_genreorders_map.genreorderid)::text = (wave_gol.genreorderid)::text))
 ->  Index Scan using 
"wave_genreorders_map_ERR_GENREORDERID_EXISTS" on wave_genreorders_map 
(cost=0.00..4015.36 rows=116099 width=27) (actual time=0.018..23.599 
rows=116099 loops=1)
 ->  Sort 
(cost=183717.70..184818.90 rows=440483 width=47) (actual 
time=782.102..813.753 rows=347308 loops=1)
   Sort Key: 
wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid
   Sort Method: 
quicksort  Memory: 39422kB
   ->  Nested Loop 
(cost=9769.36..142425.22 rows=440483 width=47) (actual 
time=33.668..138.668 rows=347308 loops=1)
 ->  Index Scan 
using "wave_rxw_ERR_WID_EXISTS" on wave_rxw  (cost=0.00..7.08 rows=1 
width=11) (actual time=0.021..0.031 rows=1 loops=1)
   Filter: 
is_previous
 ->  Bitmap 
Heap Scan on wave_gol 

Re: [PERFORM] SSD + RAID

2009-11-20 Thread Richard Neill

Axel Rau wrote:


Am 13.11.2009 um 14:57 schrieb Laszlo Nagy:

I was thinking about ARECA 1320 with 2GB memory + BBU. Unfortunately, 
I cannot find information about using ARECA cards with SSD drives.
They told me: currently not supported, but they have positive customer 
reports. No date yet for implementation of the TRIM command in firmware.

...
My other option is to buy two SLC SSD drives and use RAID1. It would 
cost about the same, but has less redundancy and less capacity. Which 
is the faster? 8-10 MLC disks in RAID 6 with a good caching 
controller, or two SLC disks in RAID1?


Despite my other problems, I've found that the Intel X25-Es work
remarkably well. The key issue for short,fast transactions seems to be
how fast an fdatasync() call can run, forcing the commit to disk, and
allowing the transaction to return to userspace.
With all the caches off, the intel X25-E beat a standard disk by a
factor of about 10.
Attached is a short C program which may be of use.


For what it's worth, we have actually got a pretty decent (and
redundant) setup using a RAIS array of RAID1.


[primary server]

SSD }
 }  RAID1  ---}  DRBD --- /var/lib/postgresql
SSD }}
  }
  }
  }
  }
[secondary server]   }
  }
SSD }}
 }  RAID1 gigE}
SSD }



The servers connect back-to-back with a dedicated Gigabit ethernet
cable, and DRBD is running in protocol B.

We can pull the power out of 1 server, and be using the next within 30
seconds, and with no dataloss.


Richard



#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define NUM_ITER 1024

int main ( int argc, char **argv ) {
	const char data[] = "Liberate";
	size_t data_len = strlen ( data );
	const char *filename;
	int fd; 
	unsigned int i;

	if ( argc != 2 ) {
		fprintf ( stderr, "Syntax: %s output_file\n", argv[0] );
		exit ( 1 );
	}
	filename = argv[1];
	fd = open ( filename, ( O_WRONLY | O_CREAT | O_EXCL ), 0666 );
	if ( fd < 0 ) {
		fprintf ( stderr, "Could not create \"%s\": %s\n",
			  filename, strerror ( errno ) );
		exit ( 1 );
	}

	for ( i = 0 ; i < NUM_ITER ; i++ ) {
		if ( write ( fd, data, data_len ) != data_len ) {
			fprintf ( stderr, "Could not write: %s\n",
  strerror ( errno ) );
			exit ( 1 );
		}
		if ( fdatasync ( fd ) != 0 ) {
			fprintf ( stderr, "Could not fdatasync: %s\n",
  strerror ( errno ) );
			exit ( 1 );
		}
	}
	return 0;
}


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill


Kevin Grittner wrote:

Richard Neill  wrote:
 

SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining )
  FROM
core.demand,
viwcs.previous_wave
LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid )
  WHERE core.demand.id = viwcs.wave_end_demand.demand_id;
 
For comparison, how does this do?:
 
SELECT (core.demand.qty - viwcs.wave_end_demand.qty_remaining)

  FROM core.demand
  JOIN viwcs.previous_wave
ON (core.demand.id = viwcs.wave_end_demand.demand_id)
  LEFT OUTER JOIN viwcs.wave_end_demand USING (wid);
 



Thanks for your help,

Unfortunately, it just complains:

ERROR:  missing FROM-clause entry for table "wave_end_demand"
LINE 4:   ON (core.demand.id = viwcs.wave_end_demand.demand_id)

Incidentally, I don't think that this particular re-ordering will make
much difference: viwcs.previous_wave is a table with a single row, and 3
columns in it. Here are the bits of schema, if they're helpful.


   View "viwcs.wave_end_demand"
Column | Type  | Modifiers
---+---+---
 wid   | character varying(10) |
 storeorderid  | character varying(30) |
 genreorderid  | character varying(30) |
 target_id | bigint|
 sid   | character varying(30) |
 material_id   | bigint|
 demand_id | bigint|
 eqa   | integer   |
 aqu   | bigint|
 qty_remaining | bigint|
View definition:
 SELECT wave_gol.wid, wave_gol.storeorderid, wave_gol.genreorderid,
wave_genreorders_map.target_id, wave_gol.sid,
product_info_sku_map.material_id, demand.id AS demand_id, wave_gol.eqa,
COALESCE(du_report_sku_sum.aqu, 0::bigint) AS aqu, wave_gol.eqa -
COALESCE(du_report_sku_sum.aqu, 0::bigint) AS qty_remaining
   FROM viwcs.wave_gol
   LEFT JOIN viwcs.wave_genreorders_map USING (wid, storeorderid,
genreorderid)
   LEFT JOIN viwcs.product_info_sku_map USING (sid)
   LEFT JOIN core.demand USING (target_id, material_id)
   LEFT JOIN ( SELECT du_report_sku.wid, du_report_sku.storeorderid,
du_report_sku.genreorderid, du_report_sku.sid, sum(du_report_sku.aqu) AS aqu
   FROM viwcs.du_report_sku
  GROUP BY du_report_sku.wid, du_report_sku.storeorderid,
du_report_sku.genreorderid, du_report_sku.sid) du_report_sku_sum USING
(wid, storeorderid, genreorderid, sid);



 View "viwcs.previous_wave"
 Column | Type  | Modifiers
+---+---
 wid| character varying(10) |
View definition:
 SELECT wave_rxw.wid
   FROM viwcs.wave_rxw
  WHERE wave_rxw.is_previous;




  Table "core.demand"
   Column|  Type   |   Modifiers
-+-+
 id  | bigint  | not null default core.new_id()
 target_id   | bigint  | not null
 material_id | bigint  | not null
 qty | integer | not null
 benefit | integer | not null default 0
Indexes:
"demand_pkey" PRIMARY KEY, btree (id)
"demand_target_id_key" UNIQUE, btree (target_id, material_id)
"demand_material_id" btree (material_id)
"demand_target_id" btree (target_id)
Foreign-key constraints:
"demand_material_id_fkey" FOREIGN KEY (material_id) REFERENCES
core.__material_id(id)
"demand_target_id_fkey" FOREIGN KEY (target_id) REFERENCES
core.waypoint(id)
Referenced by:
TABLE "core.inventory" CONSTRAINT "inventory_demand_id_fkey"
FOREIGN KEY (demand_id) REFERENCES core.demand(id)






Thanks,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill



Thom Brown wrote:
 >

It looks like your statistics are way out of sync with the real data.

 > Nested Loop  (cost=885367.03..1123996.87 rows=8686 width=12) (actual 
time=248577.879..253168.466 rows=347308 loops=1)


This shows that it thinks there will be 8,686 rows, but actually 
traverses 347,308.


Yes, I see what you mean.



Have you manually run a VACUUM on these tables?  Preferrably a full one 
if you can.  


Every night, it runs Vacuum verbose analyze on the entire database. We 
also have the autovacuum daemon enabled (in the default config).


About 2 weeks ago, I ran cluster followed by vacuum full - which seemed 
to help more than I'd expect.


[As I understand it, the statistics shouldn't change very much from day 
to day, as long as the database workload remains roughly constant. What 
we're actually doing is running a warehouse sorting books - so from one 
day to the next the particular book changes, but the overall statistics 
basically don't.]



I notice that you appear ot have multiple sorts going on.
Are all of those actually necessary for your output?  


I think so. I didn't actually write all of this, so I can't be certain.

Also consider

using partial or multicolumn indexes where useful.



Already done that. The query was originally pretty quick, with a few 
weeks worth of data, but not now. (after a few months). The times don't 
rise gradually, but have a very sudden knee.



And which version of PostgreSQL are you using?


8.4.1, including this patch:
http://archives.postgresql.org/pgsql-bugs/2009-10/msg00118.php


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill

Thom Brown wrote:



Okay, have you tried monitoring the connections to your database?

Try: select * from pg_stat_activity;


Tried that - it's very useful as far as it goes. I can see that in most 
cases, the DB is running just the one query.


What I really want to know is, how far through that query has it got?
(For example, if the query is an update, then surely it knows how many 
rows have been updated, and how many are yet to go).




And this to see current backend connections:

SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
   pg_stat_get_backend_activity(s.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;



This looks identical to just some of the columns from pg_stat_activity.



It might also help if you posted your postgresql.conf too.


Below (have removed the really non-interesting bits).

Thanks,

Richard





Thom



#--
# CONNECTIONS AND AUTHENTICATION
#--

max_connections = 500   # (change requires restart)

#--
# RESOURCE USAGE (except WAL)
#--

# - Memory -

shared_buffers = 4500MB # min 128kB
# (change requires restart)
temp_buffers = 64MB # min 800kB
#max_prepared_transactions = 0  # zero disables the feature
# (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared
# memory per transaction slot, plus lock space (see
# max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.

work_mem = 256MB# min 64kB
maintenance_work_mem = 256MB# min 1MB
max_stack_depth = 4MB   # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000   # min 25
# (change requires restart)
#shared_preload_libraries = ''  # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0ms# 0-100 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 1-1 credits

# - Background Writer -

#bgwriter_delay = 200ms # 10-1ms between rounds
#bgwriter_lru_maxpages = 100# 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0  # 0-10.0 multipler on buffers 
scanned/round


# - Asynchronous Behavior -

#effective_io_concurrency = 1   # 1-1000. 0 disables prefetching


#--
# WRITE AHEAD LOG
#--

# - Settings -

#fsync = on # turns forced synchronization 
on or off

#synchronous_commit = on# immediate fsync at commit
#wal_sync_method = fsync# the default is the first option
# supported by the operating 
system:

#   open_datasync
#   fdatasync
#   fsync
#   fsync_writethrough
#   open_sync
#full_page_writes = on  # recover from partial page writes
wal_buffers = 2MB   # min 32kB
# (change requires restart)
#wal_writer_delay = 200ms   # 1-1 milliseconds

commit_delay = 5# range 0-10, in microseconds
commit_siblings = 5 # range 1-1000

# - Checkpoints -

checkpoint_segments = 64# in logfile segments, min 1, 
16MB each (was safe value of 4)

#checkpoint_timeout = 5min  # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 
0.0 - 1.0

#checkpoint_warning = 30s   # 0 disables

# - Archiving -

#archive_mode = off # allows archiving to be done
# (change requires restart)
#archive_command = ''   # command to use to archive a logfile 
segment

#archive_timeout = 0# force a logfile segment switch after this
# number of seconds; 0 disables


#--
# QUERY TUNING
#---

Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill



Thom Brown wrote:

2009/11/20 Richard Neill mailto:[email protected]>>


It might also help if you posted your postgresql.conf too.


Below (have removed the really non-interesting bits).

Thanks,

Richard


I can't actually see anything in your config that would cause this 
problem. :/


As for seeing the progress of an update, I would have thought due to the 
atomic nature of updates, only the transaction in which the update is 
running would have visibility of the as-yet uncommitted updates.




Yes, but surely the postmaster itself (and any administrative user) 
should be able to find this out.


What I need for slow queries is some kind of progress bar. Any estimate 
(no matter how poor, or non-linear) of the query progress, or time 
remaining would be invaluable.


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill



Fernando Hevia wrote:
 


-Mensaje original-
De: Richard Neill


max_connections = 500   # (change requires restart)
work_mem = 256MB# min 64kB


Not that it has to do with your current problem but this combination could
bog your server if enough clients run sorted queries simultaneously.
You probably should back on work_mem at least an order of magnitude.



What's the correct way to configure this?

* We have one client which needs to run really big transactions 
(therefore needs the work memory).


* We also have about 200 clients which run always very small, short queries.

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread Richard Neill

Justin Pitts wrote:
Set work_mem in postgresql.conf down to what the 200 clients need, which 
sounds to me like the default setting.


In the session which needs more work_mem, execute:
SET SESSION work_mem TO '256MB'


Isn't that terribly ugly? It seems to me less hackish to rely on the 
many clients not to abuse work_mem (as we know exactly what query they 
will run, we can be sure it won't happen).


It's a shame that the work_mem parameter is a per-client one, rather 
than a single big pool.


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-22 Thread Richard Neill

Thanks very much for your help so far.


(it is pretty confusing that the HashAggregate reports ~6M rows, but
the sort does 41M rows, but maybe I can not read this).
Anyway, I think that if You up the work_mem for this query to 512M,
the sort will be in memory, an thus plenty faster.


Tried this (with work_mem 2GB). It seems to make a difference, but not 
enough: the query time is about halved (from 220 sec to 120 sec)




Also, You say You are experiencing unstable query plans, and this may
mean that geqo is kicking in (but Your query seems too simple for
that, even considering the views involved). A quick way to check that
would be to run explain  a coule tens of times, and check
if the plans change. If they do, try upping geqo_threshold.


It's not unstable from one run to the next; it's unstable from one day 
to the next (more on this later)




You have seq_page_cost 4 times larger than random_page_cost. You say
You are on SSD, so there is no random access penalty. Try setting them
equal.



Again, experimentally, it seems to be non-equal. I didn't benchmark 
this, but the random access tests done by TomsHardware et al suggest a 
factor 2.5 penalty for random access vs sequential. This is very much 
better than rotational disks, but still significant.




Your plan is full of merge-joins, some indices may be in order. Merge
join is a kind of "last-chance" plan.



I think the fix here is going to be to do more work at write-time and 
less at read-time. i.e. rather than having really complex views, we'll 
generate some extra tables, and keep them synchronized with triggers.



Richard



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-22 Thread Richard Neill



Justin Pitts wrote:
I don't know if I would call it "terribly" ugly. Its not especially 
pretty, but it affords the needed degree of twiddling to get the job 
done. Relying on the clients is fine - if you can. I suspect the vast 
majority of DBAs would find that notion unthinkable. The usual result of 
a memory overrun is a server crash.




It's probably OK in this context: the multiple clients are all instances 
of the same perl script, running particular, pre-defined queries. So we 
can trust them not to ask a really memory-intensive query.


Besides which, if you can't trust the clients to ask sensible queries, 
why can you trust them to set their own work_mem values?


Richard





On Nov 20, 2009, at 4:39 PM, Richard Neill wrote:


Justin Pitts wrote:
Set work_mem in postgresql.conf down to what the 200 clients need, 
which sounds to me like the default setting.

In the session which needs more work_mem, execute:
SET SESSION work_mem TO '256MB'


Isn't that terribly ugly? It seems to me less hackish to rely on the 
many clients not to abuse work_mem (as we know exactly what query they 
will run, we can be sure it won't happen).


It's a shame that the work_mem parameter is a per-client one, rather 
than a single big pool.


Richard

--
Sent via pgsql-performance mailing list 
([email protected])

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-22 Thread Richard Neill


Greg Smith wrote:

Richard Neill wrote:
Am I missing something though, or is this project dormant, without 
having released any files?


My bad--gave you the wrong url.  
http://git.postgresql.org/gitweb?p=pg_top.git;a=summary has the project 
I meant to point you toward.


Will try that out...




What I really want to know is, how far through that query has it got?
(For example, if the query is an update, then surely it knows how many 
rows have been updated, and how many are yet to go).
I understand what you want.  The reason you're not getting any 
suggestions is because that just isn't exposed in PostgreSQL yet.  
Clients ask for queries to be run, eventually they get rows of results 
back, but there's no notion of how many they're going to get in advance 
or how far along they are in executing the query's execution plan.  
There's a couple of academic projects that have started exposing more of 
the query internals, but I'm not aware of anyone who's even started 
moving in the direction of what you'd need to produce a progress bar.




Is there any internal table (similar to pg_stat_activity) I can look at?

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query times change by orders of magnitude as DB ages

2009-11-22 Thread Richard Neill

Dear All,

Thanks for your help earlier with the previous question. I wonder if I 
might ask another.



We have various queries that need to run, of which I'm going to focus on 
2, "vox" and "du_report".


Both of them are extremely sensitive to the precise values of 
random_page_cost and seq_page_cost. Experimentally, I've used:


 A:  seq_page_cost = 0.25;  random_page_cost = 0.75
 B:  seq_page_cost = 0.5;  random_page_cost = 2
 C: seq_page_cost = 1;  random_page_cost = 4

(and a few in between).


If I pick the wrong one, then either vox becomes 2 orders of magnitude 
slower (22ms -> 3.5 seconds), or du_report becomes 10x slower. I can't 
use the same setting for both.


So, as a very ugly hack, I've tuned the sweet spots for each query.
Vox normally sits at B; du_report at C.


Now, the real killer is that the position of that sweet spot changes 
over time as the DB ages over a few days (even though autovacuum is on).


Worse still, doing a cluster of most of the tables and vacuum full 
analyze   made most of the queries respond much better, but the vox 
query became very slow again, until I set it to A (which, a few days 
ago, did not work well).



*  Why is the query planner so precisely sensitive to the combination of 
page costs and time since last vacuum full?


* Why is it that what improves one query can make another get so much worse?

* Is there any way I can nail the query planner to a particular query 
plan, rather than have it keep changing its mind?


* Is it normal to keep having to tune the query-planner's settings, or 
should it be possible to set it once, and leave it?



Tuning this feels rather like adjusting several old radios, which are 
exceptionally finicky about the precise settings, having a very sharp 
resonance peak (in different places), and which drift out of tune at 
different rates. I must be doing something wrong, but what?


Thanks for your advice,

Richard



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID card recommendation

2009-11-24 Thread Richard Neill

Matthew Wakeling wrote:


We're about to purchase a new server to store some of our old databases, 
and I was wondering if someone could advise me on a RAID card. We want 
to make a 6-drive SATA RAID array out of 2TB drives, and it will be RAID 
5 or 6 because there will be zero write traffic. The priority is 
stuffing as much storage into a small 2U rack as possible, with 
performance less important. We will be running Debian Linux.


People have mentioned Areca as making good RAID controllers. We're 
looking at the "Areca ARC-1220 PCI-Express x8 SATA II" as a possibility. 
Does anyone have an opinion on whether it is a turkey or a star?


Another possibility is a 3-ware card of some description.



Do you actually need a RAID card at all? It's just another point of 
failure: the Linux software raid (mdadm) is pretty good.


Also, be very wary of RAID5 for an array that size. It is highly 
probable that, if one disk has failed, then during the recovery process, 
you may lose a second disk. The unrecoverable error rate on standard 
disks is about 1 in 10^14 bits; your disk array is 10^11 bits in size...


We got bitten by this

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I 
didn't realise that vacuum full was ever actively bad, only sometimes 
unneeded. I do now - thanks for the tip.




If you have run a cluster command, then running vacuum full will make 
the table and index layout worse, not better.




So, having managed to bloat the indexes in this way, what can I do to 
fix it? Will a regular vacuum do the job?


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I
didn't realise that vacuum full was ever actively bad, only sometimes
unneeded. I do now - thanks for the tip.



If you have run a cluster command, then running vacuum full will make 
the table and index layout worse, not better.




So, having managed to bloat the indexes in this way, what can I do to
fix it? Will a regular vacuum do the job?

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Wed, 25 Nov 2009, Richard Neill wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


In fact, cluster is exactly the command you are looking for. It will 
drop the indexes, do a complete table rewrite (in the correct order), 
and then recreate all the indexes again.


In normal operation, a regular vacuum will keep the table under control, 
but if you actually want to shrink the database files in exceptional 
circumstances, then cluster is the tool for the job.




Thanks - now I understand.

In terms of just index bloat, does a regular vacuum help?

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill

Sergey Aleynikov wrote:

Hello,


* Is there any way I can nail the query planner to a particular query plan,
rather than have it keep changing its mind?


All these setting leads to choosing different plans. If you have small
number of complex sensitive queires, you can run explain on them with
correct settings, then re-order query (joins, subselects) according to
given query plan, and, before running it, call

set local join_collapse_limit = 1;
set local from_collapse_limit = 1;


It's a simple query, but using a complex view. So I can't really 
re-order it.



This will prevent joins/subselects reordering inside current
transaction block, leading to consistent plans. But that gives no 100%
guarantee for chosing, for example, hash join over nested loop.


Are you saying that this means that the query planner frequently makes 
the wrong choice here?




Worse still, doing a cluster of most of the tables and vacuum full analyze  
 made most of the queries >respond much better, but the vox query 
became very slow again, until I set it to A (which, a few days >ago, did 
not work well).


Is your autovacuuming tuned correctly? For large tables, i set it
running much more agressivly then in default install.


I hadn't changed it from the defaults; now I've changed it to:

autovacuum_max_workers = 6
autovacuum_vacuum_scale_factor = 0.002
autovacuum_analyze_scale_factor = 0.001

is that enough?

The DB isn't growing that much, but  it does seem to need frequent 
vacuum/analyze.



Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How exactly does Analyze work?

2009-11-25 Thread Richard Neill

Dear All,

Thanks very much for your help so far. My understanding of PG is getting 
a lot better!


I wonder if I've understood analyze properly: I'm not sure I quite 
understand how specific the statistics gathered actually are.



In particular, what happens in the following case:
  1. I start with have a table with 100 million rows, and column wid has
 linearly distributed values from 45-90.  (wid is indexed)

  2. I run vacuum analyze

  3. I insert about 2 million rows, all of which have the new wid of 91.

  4. I then do a select * WHERE wid = 91.

How smart is analyze? Will it actually say "well, I've never seen 91 in 
this table, because all the values only go up to 90, so you'd better do 
a sequential scan"?



-

On another note, I notice that if I ever manually run vacuum or analyze, 
the performance of the database drops to the point where many of the 
operators get kicked out. Is there any way to run them "nice" ?


We need to maintain a response time of under 1 second all day for simple 
queries (which usually run in about 22ms). But Vacuum or Analyze seem to 
lock up the system for a few minutes, during which other queries block 
on them, although there is still plenty of CPU spare.


-


Also, I find that, even with the autovacuum daemon running, there was 
one query last night that I had to terminate after an hour. In 
desperation, I restarted postgres, let it take 15 mins to vacuum the 
entire DB, and then re-ran the query (in 8 minutes)


Any ideas how I can troubleshoot this better? The database is only 30GB 
in total - it should (if my intuition is right) be impossible that any 
simple select (even over a modestly complex view) should take longer 
than a multiple of the time required to read all the data from disk?




Thanks very much,

Richard









--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Richard Neill



Sergey Aleynikov wrote:

Hello,

2009/11/25 Richard Neill :

Also, if you find odd statistics of freshly analyzed table - try
increasing statistics target, using
ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ...

If you're using defaults - it's again low for large tables. Start with
200, for example.


Thanks. I already had it set way up: 3000.

Is there a good description of exactly what analyse does, and how?
(in particular, what sort of statistics it gathers).

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Analyse without locking?

2009-11-26 Thread Richard Neill

Dear All,

I'm wondering whether Vacuum/analyse (notably by the autovaccuum daemon) 
is responsible for some deadlocks/dropouts I'm seeing.


One particular table gets hit about 5 times a second (for single row 
updates and inserts) + associated index changes. This is a very light 
load for the hardware; we have 7 CPU cores idling, and very little disk 
activity. The query normally runs in about 20 ms.


However, the query must always respond within 200ms, or userspace gets 
nasty errors.  [we're routing books on a sorter machine, and the book 
misses its exit opportunity]. Although this is a low load, it's a bit 
like a heartbeat.


The question is, could the autovacuum daemon (running either in vacuum 
or in analyse mode) be taking out locks on this table that sometimes 
cause the query response time to go way up (exceeding 10 seconds)?


I think I've set up autovacuum to do "little and often", using
  autovacuum_vacuum_cost_delay = 20ms
  autovacuum_vacuum_cost_limit = 20
but I'm not sure this is doing exactly what I think it is. In 
particular, the system-wide I/O (and CPU) limit of autovacuum is 
negligible, but it's possible that queries may be waiting on locks.


In particular, I want to make sure that the autovacuum daemon never 
holds any lock for more than about 50ms at a time. (or will release it 
immediately if something else wants it)


Or am I barking up the wrong tree entirely?

Thanks,

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyse without locking?

2009-11-28 Thread Richard Neill

Greg Smith wrote:

Richard Neill wrote:

Or am I barking up the wrong tree entirely?
If you haven't already tuned checkpoint behavior, it's more likely 
that's causing a dropout than autovacuum.  See the checkpoint_segments 
section of http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server 
for an intro.




Greg Smith wrote:
> Richard Neill wrote:
>> Or am I barking up the wrong tree entirely?
> If you haven't already tuned checkpoint behavior, it's more likely
> that's causing a dropout than autovacuum.  See the checkpoint_segments
> section of http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
> for an intro.
>

Thanks - I did that already - it's currently
   checkpoint_segments = 64

Now, I understand that increasing checkpoint_segments is generally a 
good thing (subject to some limit), but doesn't that just mean that 
instead of say a 1 second outage every minute, it's a 10 second outage 
every 10 minutes?


Also, correct me if I'm wrong, but mere selects shouldn't cause any 
addition to the WAL. I'd expect that a simple row insert might require 
perhaps 1kB of disk writes(*), in which case we're looking at only a few 
kB/sec at most of writes in normal use.?


Is it possible (or even sensible) to do a manual vacuum analyze with 
nice/ionice?


Richard



(*)A typical write should be about 80 Bytes of data, in terms of how 
much is actually being stored. I'm using the engineers' "rule of 10" 
approximation to call that 1kB, based on indexes, and incomplete pages.



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyse without locking?

2009-11-28 Thread Richard Neill

Thanks for your explanations.

Tom Lane wrote:

Richard Neill  writes:
Now, I understand that increasing checkpoint_segments is generally a 
good thing (subject to some limit), but doesn't that just mean that 
instead of say a 1 second outage every minute, it's a 10 second outage 
every 10 minutes?


In recent PG versions you can spread the checkpoint I/O out over a
period of time, so it shouldn't be an "outage" at all, just background
load.  Other things being equal, a longer checkpoint cycle is better
since it improves the odds of being able to coalesce multiple changes
to the same page into a single write.  The limiting factor is your
threshold of pain on how much WAL-replay work would be needed to recover
after a crash.


That makes sense. I think that 64 is sane - it means crash-recovery 
takes less than 1 minute, yet we aren't seeing the warning that 
checkpoints are too frequent.


Is it possible (or even sensible) to do a manual vacuum analyze with 
nice/ionice?


There's no support for that in PG.  You could try manually renice'ing
the backend that's running your VACUUM but I'm not sure how well it
would work; there are a number of reasons why it might be
counterproductive.  Fooling with the vacuum_cost_delay parameters is the
recommended way to make a vacuum run slower and use less of the machine.


I see why it might not work well - priority inversion etc.

What I was trying to achieve is to say that vacuum can have all the 
spare idle CPU/IO that's available, but must *immediately* back off when 
something else needs the CPU/IO/Locks.


For example,
  nice -n 20 yes > /dev/null
  ionice -c 3 dd if=/dev/zero > tmp.del

will both get quite a lot of work done on a medium-loaded system (try 
this on your own laptop), but have zero impact on other processes.


On the other hand, changing vacuum_cost_delay means that vacuum runs 
slowly even if the CPU is otherwise idle; yet it still impacts on the 
responsiveness of some queries.



Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-01 Thread Richard Neill

Dear All,

I don't know if this is a stupid question, or not, but I can't 
understand the following.


I have a pretty simple query, which runs in about 7ms

  SELECT * FROM h.inventory WHERE demand_id =289276563;


The result of this is a 15 row x 17 column table. However, I want this 
to be sorted by id, so I changed the query to:



   SELECT * FROM h.inventory WHERE demand_id =289276563 ORDER BY id;

which makes it take 32 seconds!


That surprises me - I'd expect the ORDER BY to be the last thing that 
runs, and for a sort of such a small dataset  to be almost 
instantaneous. Indeed, if I do ORDER BY random(), then it's fast.


The system is running 8.4.1, and is otherwise lightly loaded, I can do 
this repeatedly with similar results.


Is this normal? Have I hit a bug?

I attach the view definition, the result set, and the output from 
explain analyze (both ways).


Thanks,

Richard




  View "h.inventory"
Column |   Type   | Modifiers
---+--+---
 id| bigint   |
 material_id   | bigint   |
 material_tag  | text |
 material_name | text |
 location_id   | bigint   |
 location_tag  | text |
 location_name | text |
 qty   | integer  |
 divergence| integer  |
 ctime | timestamp with time zone |
 actor_id  | bigint   |
 actor_tag | text |
 actor_name| text |
 demand_id | bigint   |
 target_id | bigint   |
 target_tag| text |
 target_name   | text |
View definition:
 SELECT inventory.id, inventory.material_id, h_material.tag AS 
material_tag, h_material.name AS material_name, inventory.location_id, 
h_location.tag AS location_tag, h_location.name AS location_name, 
inventory.qty, inventory.divergence, inventory.ctime, 
inventory.actor_id, h_actor.tag AS actor_tag, h_actor.name AS 
actor_name, inventory.demand_id, h_demand.target_id, 
h_demand.target_tag, h_demand.target_name

   FROM core.inventory
   LEFT JOIN h.material h_material ON inventory.material_id = h_material.id
   LEFT JOIN h.location h_location ON inventory.location_id = h_location.id
   LEFT JOIN h.actor h_actor ON inventory.actor_id = h_actor.id
   LEFT JOIN h.demand h_demand ON inventory.demand_id = h_demand.id;












id | material_id | material_tag  | material_name | location_id 
| location_tag | location_name  | qty | divergence | 
 ctime | actor_id | actor_tag |  actor_name 
  | demand_id | target_id |   target_tag   |  target_name

---+-+---+---+-+--++-++---+--+---+--+---+---++
 292904293 |   289238938 | 101113980 |   |   280410977 
| 1030576  | Container 1030576  |   0 |  0 | 2009-12-01 
14:53:35.010023+00 | 5543 | 139664| Joanna Mikolajczak 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 292904294 |   289238938 | 101113980 |   |   280410977 
| 1030576  | Container 1030576  |   1 |  0 | 2009-12-01 
14:53:35.060378+00 |  |   | 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 292904292 |   289238938 | 101113980 |   |4008 
| 308  | Chute 308  |   0 |  0 | 2009-12-01 
14:53:34.925801+00 | 5543 | 139664| Joanna Mikolajczak 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 292817907 |   289238938 | 101113980 |   |5137 
| 991  | Chute 991 (not needed) |   0 |  0 | 2009-12-01 
14:38:00.819189+00 | 6282 | CW 991| Chute 991 worker 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 291755251 |   289238938 | 101113980 |   |5137 
| 991  | Chute 991 (not needed) |   0 |  0 | 2009-12-01 
12:03:05.957424+00 | 6282 | CW 991| Chute 991 worker 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 291543235 |   289238938 | 101113980 |   |5137 
| 991  | Chute 991 (not needed) |   0 |  0 | 2009-12-01 
11:35:19.28846+00  | 6282 | CW 991| Chute 991 worker 
 | 289276563 |  3153 | 300244 EMBARGO | 300244 308/09 HAYWARDS 
HEATH / EMBARGO
 291524046 |   289238938 | 101113980 |   |4008 
| 308  | Chute 308  |   0 |  0 | 2009-12

Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-01 Thread Richard Neill



Jean-Michel Pouré wrote:

Le mardi 01 décembre 2009 à 18:52 +, Richard Neill a écrit :

Is this normal? Have I hit a bug?


PostgreSQL query analyzer needs to run a couple of times before it can
rewrite and optimize the query. Make sure demand_id, id and join IDs
carry indexes.



I did, and they do. This table has been in place for ages, with 
autovacuum on, and a manual vacuum analyze every night. I checked by 
running analyze explicitly on all the relevant tables just before 
posting this.



Run EXPLAIN ANALYSE your_query to understand how the parser works and
post it back here.



Already in previous email :-)


Kind regards,
Jean-Michel




Kevin Grittner wrote:
> Richard Neill  wrote:
>
>> I'd expect the ORDER BY to be the last thing that runs
>
>>   Nested Loop Left Join  (cost=0.00..727737158.77
>> rows=806903677108 width=195) (actual time=31739.052..32862.322
>> rows=15 loops=1)
>
> It probably would if it knew there were going to be 15 rows to sort.
> It is estimating that there will be 806,903,677,108 rows, in which
> case it thinks that using the index will be faster.  The question is
> why it's 10 or 11 orders of magnitude off on the estimate of result
> rows.  Could you show us the table definitions underlying that view?
>
> -Kevin
>


Am I wrong in thinking that ORDER BY is always applied after the main 
query is run?


Even if I run it this way:

select * from (select * from h.inventory where demand_id =289276563) as 
sqry order by id;


which should(?) surely force it to run the first select, then sort, it's 
still very slow. On the other hand, it's quick if I do order by id+1


The table definitions are as follows (sorry there are so many).


Richard







fswcs=# \d h.demand
   View "h.demand"
Column |  Type   | Modifiers
---+-+---
 id| bigint  |
 target_id | bigint  |
 target_tag| text|
 target_name   | text|
 material_id   | bigint  |
 material_tag  | text|
 material_name | text|
 qty   | integer |
 benefit   | integer |
View definition:
 SELECT demand.id, demand.target_id, h_target_waypoint.tag AS 
target_tag, h_target_waypoint.name AS target_name, demand.material_id, 
h_material.tag AS material_tag, h_material.name AS material_name, 
demand.qty, demand.benefit

   FROM core.demand
   LEFT JOIN h.waypoint h_target_waypoint ON demand.target_id = 
h_target_waypoint.id

   LEFT JOIN h.material h_material ON demand.material_id = h_material.id;

fswcs=# \d core.demand
  Table "core.demand"
   Column|  Type   |   Modifiers
-+-+
 id  | bigint  | not null default core.new_id()
 target_id   | bigint  | not null
 material_id | bigint  | not null
 qty | integer | not null
 benefit | integer | not null default 0
Indexes:
"demand_pkey" PRIMARY KEY, btree (id) CLUSTER
"demand_target_id_key" UNIQUE, btree (target_id, material_id)
"demand_material_id" btree (material_id)
"demand_target_id" btree (target_id)
Foreign-key constraints:
"demand_material_id_fkey" FOREIGN KEY (material_id) REFERENCES 
core.__material_id(id)
"demand_target_id_fkey" FOREIGN KEY (target_id) REFERENCES 
core.waypoint(id)

Referenced by:
TABLE "viwcs.du_report_contents" CONSTRAINT 
"du_report_contents_demand_id_fkey" FOREIGN KEY (demand_id) REFERENCES 
core.demand(id)
TABLE "core.inventory" CONSTRAINT "inventory_demand_id_fkey" 
FOREIGN KEY (demand_id) REFERENCES core.demand(id)
TABLE "viwcs.wave_demand" CONSTRAINT "wave_demand_demand_id_fkey" 
FOREIGN KEY (demand_id) REFERENCES core.demand(id)


fswcs=# \d h.waypoint
View "h.waypoint"
  Column   |  Type   | Modifiers
---+-+---
 id| bigint  |
 tag   | text|
 name  | text|
 is_router | boolean |
 is_target | boolean |
 is_packer | boolean |
View definition:
 SELECT waypoint.id, waypoint.tag, waypoint.name, waypoint.is_router, 
waypoint.is_target, waypoint.is_packer

   FROM core.waypoint;

fswcs=# \d h.material
  View "h.material"
 Column |  Type   | Modifiers
+-+---
 id | bigint  |
 tag| text|
 name   | text|
 mass   | integer |
 volume | integer |
View definition:
 SELECT material.id, material.tag, material.name, material.mass, 
material.volume

   FROM core.material;

fswcs=# \d core.wa
core.waypoint   core.waypoint_name_key  core.waypoint_pkey 
core.waypoint_tag_key

fswcs=# \d core.waypoint
Table "core.waypoint"
  Column   |  Type   |   Modifiers
---+-+
 id| bigint  | not null default core.new_id()
 tag   |

Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-01 Thread Richard Neill

Dear Kevin,

Thanks for a very helpful reply.

Kevin Grittner wrote:

Richard Neill  wrote:
 

Am I wrong in thinking that ORDER BY is always applied after the
main query is run?
 
Yes, you are wrong to think that.  It compares the costs of various

plans, and when it has an index with the high order portion matching
your ORDER BY clause, it may think that it can scan that index to
generate the correct sequence.  If the sort is estimated to be
expensive enough compared to the index scan, it will use the index
scan and skip the sort.  Sorting hundreds of billions of rows can be
expensive.
 


That makes sense now.




Even if I run it this way:

select * from (select * from h.inventory where demand_id
=289276563) as sqry order by id;

which should(?) surely force it to run the first select, then
sort,
 
I wouldn't necessarily assume that.  You can EXPLAIN that form of

the query and find out easily enough.  Does it say:
 
->  Index Scan using inventory_demand_id on

inventory  (cost=0.00..22.36 rows=28 width=56) (actual time=0.025..0.053
rows=15 loops=1)
  Index Cond: (demand_id = 289276563)
 
or:
 
->  Index Scan using inventory_pkey on

inventory  (cost=0.00..879728.20 rows=28 width=56) (actual
time=31738.956..32860.738 rows=15 loops=1)
  Filter: (demand_id = 289276563)
 

it's quick if I do order by id+1
 
You don't have an index on id+1.
 


Your explanation is validated by the explain - it only does the sort 
last iff I use "order by id+1", where there is no index for that.


[Aside: using "id+0" also forces a sort.]


 
The real problem to solve here is that it's estimating the rows

count for the result so badly.  If you need a short-term
work-around, you've already discovered that you can keep it from
using the index on id for ordering by creating an expression using
id which causes it not to consider the index a match.  That's kind
of ugly to keep long term, though.
 


We seem to have a general case of very bad query plans, where in other 
cases, explain analyze shows that the query-planner's guesses are miles 
adrift.


Others have said that this is symptomatic of a lack of doing analyze, 
however we are doing quite a lot of analyzing (both through autovacuum, 
and a manual "vacuum verbose analyze" every night). Our underlying 
statistical distribution isn't that changeable.


Thanks,

Richard









--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Richard Neill



Kevin Grittner wrote:

Tom Lane  wrote:
 

That does look weird.  Do we have a self-contained test case?


Not at the moment. It seems to only occur with relatively complex joins.

 
Richard, could you capture the schema for the affected tables and

views with pg_dump -s and also the related rows from pg_statistic?
(The actual table contents aren't needed to see this issue.)
 


Here are the relevant parts of the schema - I've cut this out of the 
source-tree rather than pg_dump, since it seems more readable.


Regarding  pg_statistic, I don't understand how to find the relevant 
rows - what am I looking for? (the pg_statistic table is 247M in size).


Thanks for your help,

Richard

THE PROBLEM QUERY
-

SELECT * FROM h.inventory WHERE demand_id = 289276563 ORDER BY id;

#Note that using core.inventory (which is simpler) does not have the 30-second problem.

#In general the h namespace is intended to be a human-readable diagnostic version whereas the core namespace
#is optimised for the application






h.inventory and core.inventory
--

--
-- Inventory
--
CREATE TABLE core.inventory (
	id bigint NOT NULL DEFAULT core.new_id(),
	material_id bigint NOT NULL,
	location_id bigint NOT NULL,
	qty integer NOT NULL,
	divergence integer NOT NULL DEFAULT 0,
	ctime timestamp with time zone NOT NULL DEFAULT now(),
	actor_id bigint NULL,
	demand_id bigint NULL,
	PRIMARY KEY ( id ),
	FOREIGN KEY ( material_id ) REFERENCES core.__material_id ( id ),
	FOREIGN KEY ( location_id ) REFERENCES core.__location_id ( id ),
	FOREIGN KEY ( actor_id ) REFERENCES core.actor ( id ),
	FOREIGN KEY ( demand_id ) REFERENCES core.demand ( id )
);
CREATE INDEX inventory_material_id ON core.inventory ( material_id );
CREATE INDEX inventory_location_id ON core.inventory ( location_id );
CREATE INDEX inventory_actor_id ON core.inventory ( actor_id );
CREATE INDEX inventory_demand_id ON core.inventory ( demand_id );
CREATE OR REPLACE VIEW h.inventory AS
SELECT	core.inventory.id,
	core.inventory.material_id,
	h_material.tag AS material_tag,
	h_material.name AS material_name,
	core.inventory.location_id,
	h_location.tag AS location_tag,
	h_location.name AS location_name,
	core.inventory.qty,
	core.inventory.divergence,
	core.inventory.ctime,
	core.inventory.actor_id,
	h_actor.tag AS actor_tag,
	h_actor.name AS actor_name,
	core.inventory.demand_id,
	h_demand.target_id,
	h_demand.target_tag,
	h_demand.target_name
FROM	core.inventory
	LEFT OUTER JOIN h.material AS h_material
		ON core.inventory.material_id = h_material.id
	LEFT OUTER JOIN h.location AS h_location
		ON core.inventory.location_id = h_location.id
	LEFT OUTER JOIN h.actor AS h_actor
		ON core.inventory.actor_id = h_actor.id
	LEFT OUTER JOIN h.demand AS h_demand
		ON core.inventory.demand_id = h_demand.id;





h.material and core.material


--
-- Weights, dimensions, and other material data
--

--
-- Materials
--
CREATE TABLE core.material (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES,
	mass integer NOT NULL CHECK ( mass >= 0 ),
	volume integer NOT NULL CHECK ( volume >= 0 )
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'material', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'material', 'tag', 'tag' );
SELECT core.create_inheritable_unique_index ( 'material', 'id', 'bigint' );
COMMENT ON COLUMN core.material.mass IS
	'Mass in grams';
COMMENT ON COLUMN core.material.volume IS
	'Volume in ml';
CREATE OR REPLACE VIEW h.material AS
SELECT	core.material.id,
	core.material.tag,
	core.material.name,
	core.material.mass,
	core.material.volume
FROM	core.material;












h.location and core.location


--
-- Locations
--
CREATE TABLE core.location (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'location', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'location', 'tag', 'tag' );
SELECT core.create_inheritable_unique_index ( 'location', 'id', 'bigint' );
CREATE OR REPLACE VIEW h.location AS
SELECT	core.location.id,
	core.location.tag,
	core.location.name
FROM	core.location;







h.actor and core.actor
--

--
-- Actors
--
CREATE TABLE core.actor (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'actor', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'actor', 'tag', 'tag' );
CREATE OR REPLACE VIEW h.actor AS
SELECT	core.actor.id,
	core.actor.tag,
	core.actor.name
FROM	core.actor;





h.demand and core.demand
-

--
-- Demand
--
CREATE TABLE core.demand (
	id bigint NOT NULL DEFAULT core.new_id(),
	target_id bigint NOT NULL,
	material_id bigint NOT NULL,
	qty integer NOT NULL,
	-- HACK
	benefit integer NOT NULL DEFAULT 0,
	PRIMARY KEY ( id ),
	UNIQUE ( target_id, material_id ),
	FOREIGN KEY ( target_id ) REFERENCE

Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Richard Neill



Kevin Grittner wrote:

Richard Neill  wrote:
 

Regarding  pg_statistic, I don't understand how to find the
relevant rows - what am I looking for? (the pg_statistic table is
247M in size).
 
I think the only relevant rows would be the ones with starelid =

pg_class.oid for a table used in the query, and I think you could
further limit it to rows where staattnum = pg_attribute.attnum for a
column referenced in the WHERE clause or a JOIN's ON clause
(including in the views).  To help match them up, and to cover all
the bases, listing the related pg_class and pg_attribute rows would
help.
 
Hopefully that will allow us to generate the same plan in an

EXPLAIN, and then see how it gets such an overblown estimate of the
result rows.



Thanks for your explanation. I ran the query:

SELECT * from pg_statistic WHERE starelid IN
  (SELECT oid FROM pg_class where relname IN
('demand','waypoint','actor','location','material','inventory')
  );

and it's 228kB compressed, so rather than attaching it, I'm placing it 
here:  http://www.richardneill.org/tmp/pg_statistic.bz2



Likewise, the much smaller (16kB) output from:

SELECT * from pg_class where relname IN
  ('demand','waypoint','actor','location','material','inventory');

SELECT * from pg_attribute ;

is at: http://www.richardneill.org/tmp/pg_attribute_pg_class.bz2



P.S. Would it be easier for you if I set up SSH access to a spare 
machine, with a copy of the database?



Thanks very much for your help,

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyse without locking?

2009-12-02 Thread Richard Neill

Dear All,

I'm still puzzled by this one - it looks like it's causing about 5% of 
queries to rise in duration from ~300ms to 2-6 seconds.


On the other hand, the system never seems to be I/O bound. (we have at 
least 25 MB/sec of write bandwidth, and use a small fraction of that 
normally).


Here's the typical checkpoint logs:

2009-12-03 06:21:21 GMT LOG:  checkpoint complete: wrote 12400 buffers 
(2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; 
write=149.883 s, sync=5.143 s, total=155.040 s


We're using 8.4.1, on ext4 with SSD. Is it possible that something 
exotic is occurring to do with write barriers (on by default in ext4, 
and we haven't changed this).


Perhaps a low priority IO process for writing the previous WAL to disk 
is blocking a high-priority transaction (which is trying to write to the 
new WAL). If the latter is trying to sync, could the large amount of 
lower priority IO be getting in the way thanks to write barriers?


If so, can I safely turn off write barriers?

Thanks,

Richard


P.S. Should I rename this thread?




Richard Neill wrote:

Dear All,

It definitely looks checkpoint-related - the checkpoint timeout is set 
to 5 minutes, and here is a graph of our response time (in ms) over a 1 
hour period. The query is pretty much identical each time.


Any ideas what I could do to make checkpoints not hurt performance like 
this?


Thanks,

Richard



Tom Lane wrote:

Richard Neill  writes:
Now, I understand that increasing checkpoint_segments is generally a 
good thing (subject to some limit), but doesn't that just mean that 
instead of say a 1 second outage every minute, it's a 10 second 
outage every 10 minutes?


In recent PG versions you can spread the checkpoint I/O out over a
period of time, so it shouldn't be an "outage" at all, just background
load.  Other things being equal, a longer checkpoint cycle is better
since it improves the odds of being able to coalesce multiple changes
to the same page into a single write.  The limiting factor is your
threshold of pain on how much WAL-replay work would be needed to recover
after a crash.







--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Richard Neill

Dear All,

Thanks for all your help so far. This page was particularly helpful:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
(does the advice for 8.3 apply unchanged to 8.4?)

I'm still hitting issues with this though: sync is taking 7-10 seconds
and I need to get it down to nearer 3.

We're running a lightly-loaded system which has a realtime response 
requirement of 0.5 seconds all the time (with a few seconds permissible 
occasionally, but never exceeding 10).


So far, I've set checkpoint_segments to 128, timeout to 10min, and 
completion_target to 0.8. This helps, but not as much as I'd hoped.


But I haven't touched any of the other WAL or BG Writer settings.

Where should I look next?
  Should I be looking at the BG Writer settings,
  or should I look at the Linux VM configuration?
(eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)

  Or would it be most useful to try to move the WAL to a different disk?


Latest messages:

# tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check

2009-12-08 09:12:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:20:09 GMT LOG:  checkpoint complete: wrote 51151 buffers 
(8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled; 
write=479.669 s, sync=9.852 s, total=489.553 s


2009-12-08 09:22:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:30:07 GMT LOG:  checkpoint complete: wrote 45772 buffers 
(7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled; 
write=479.706 s, sync=7.337 s, total=487.120 s


2009-12-08 09:32:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:40:09 GMT LOG:  checkpoint complete: wrote 47043 buffers 
(8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled; 
write=479.744 s, sync=9.300 s, total=489.122 s


2009-12-08 09:42:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:50:07 GMT LOG:  checkpoint complete: wrote 48210 buffers 
(8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; 
write=479.689 s, sync=7.707 s, total=487.416 s



Thanks a lot,

Richard



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-08 Thread Richard Neill

Dear All,

Thanks for all your help so far. This page was particularly helpful:
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
(does the advice for 8.3 apply unchanged to 8.4?)

I'm still hitting issues with this though: sync is taking 7-10 seconds
and I need to get it down to nearer 3.

We're running a lightly-loaded system which has a realtime response
requirement of 0.5 seconds all the time (with a few seconds permissible
occasionally, but never exceeding 10).

So far, I've set checkpoint_segments to 128, timeout to 10min, and
completion_target to 0.8. This helps, but not as much as I'd hoped.

But I haven't touched any of the other WAL or BG Writer settings.

Where should I look next?
  Should I be looking at the BG Writer settings,
  or should I look at the Linux VM configuration?
(eg changing /proc/sys/vm/dirty_background_ratio from 5 to 1)

  Or would it be most useful to try to move the WAL to a different disk?


Latest messages:

# tail -f /var/log/postgresql/postgresql-8.4-main.log | grep check

2009-12-08 09:12:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:20:09 GMT LOG:  checkpoint complete: wrote 51151 buffers
(8.9%); 0 transaction log file(s) added, 0 removed, 23 recycled;
write=479.669 s, sync=9.852 s, total=489.553 s

2009-12-08 09:22:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:30:07 GMT LOG:  checkpoint complete: wrote 45772 buffers
(7.9%); 0 transaction log file(s) added, 0 removed, 24 recycled;
write=479.706 s, sync=7.337 s, total=487.120 s

2009-12-08 09:32:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:40:09 GMT LOG:  checkpoint complete: wrote 47043 buffers
(8.2%); 0 transaction log file(s) added, 0 removed, 22 recycled;
write=479.744 s, sync=9.300 s, total=489.122 s

2009-12-08 09:42:00 GMT LOG:  checkpoint starting: time
2009-12-08 09:50:07 GMT LOG:  checkpoint complete: wrote 48210 buffers
(8.4%); 0 transaction log file(s) added, 0 removed, 23 recycled;
write=479.689 s, sync=7.707 s, total=487.416 s


Thanks a lot,

Richard




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill

A couple of thoughts occur to me:

1. For reads, RAID 1 should also be good: it will allow a read to occur
from whichever disk can provide the data fastest.

2. Also, for reads, the more RAM you have, the better (for caching). I'd
suspect that another 8GB of RAM is a better expenditure than a 2nd drive
in many cases.

3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
intead. If you use the Linux software mdraid, remote admin is easy.

4. If you can tolerate the risk of the most recent transactions being
lost, look at asynchronous commit. Likewise, you *might* consider
operating with a write cache enabled. Otherwise, the time for
fdatasync() is what's critical.

5. For a 2-disk setup, I think that main DB on one, with WAL on the
other will beat having everything on a single RAID0.

6. The WAL is relatively small: you might consider a (cheap) solid-state
disk for it.

7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4, then have the first disk set up
as a split between OS and WAL; the 2nd disk set up for
/var/lib/postgresql,  and the 3rd disk as a backup for everything (and a
spare OS with SSH access).

8. Lastly, if you need remote administration, and can justify another
£100 or so, the HP "iLO" (integrated lights out) cards are rather
useful: these effectively give you VNC without OS support, even for the
BIOS.

Best wishes,

Richard


Ognjen Blagojevic wrote:

Hi all,

I'm trying to figure out which HW configuration with 3 SATA drives is 
the best in terms of reliability and performance for Postgres database.


I'm thinking to connect two drives in RAID 0, and to keep the database 
(and WAL) on these disks - to improve the write performance of the SATA 
drives.


The third drive will be used to reduce the cost of the RAID 0 failure 
without reducing the performance. Say, I could configure Postgres to use 
the third drive as backup for WAL files, with archive_timeout set to 15 
minutes. Daily backups will be created on different server. Loss of last 
15 minute updates is something the customer can afford. Also, one day 
restore time is case of failure is also affordable (to reinstall the OS, 
Postgres, restore backup, and load WALs).


The server will be remotely administered, that is why I'm not going for 
RAID 1, 1+0 or some other solution for which, I beleive, the local 
administion is crucial.


Server must be low budget, that is why I'm avoiding SAS drives. We will 
use CentOS Linux and Postgres 8.4. The database will have 90% of read 
actions, and 10% of writes.


I would like to hear your opinion, is this reasonable or I should 
reconsider RAID 1?


Regards,
Ognjen




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill


Greg Smith wrote:

Richard Neill wrote:

3. RAID 0 is twice as unreliable as no raid. I'd recommend using RAID 1
intead. If you use the Linux software mdraid, remote admin is easy.


The main thing to be wary of with Linux software RAID-1 is that you 
configure things so that both drives are capable of booting the system.  
It's easy to mirror the data, but not the boot loader and the like.


Good point. I actually did this on a home PC (2 disks in RAID 1). The
solution is simple: just "grub-install /dev/sda; grub-install /dev/sdb"
and that's all you have to do, provided that /boot is on the raid array.

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card. Those are a
pain, and less configurable, but it will take care of the bootloader issue.

Obviously, test it both ways.






7. If you have 3 equal disks, try doing some experiments. My inclination
would be to set them all up with ext4...


I have yet to yet a single positive thing about using ext4 for 
PostgreSQL.  Stick with ext3, where the problems you might run into are 
at least well understood and performance is predictable.


I did some measurements on fdatasync() performance for ext2,ext3,ext4.

I found ext2 was fastest, ext4 was twice as slow as ext2, and ext3 was
about 5 times slower than ext2. Also, ext4 is doesn't having an
appallingly slow fsck.

We've had pretty good results from ext4.

Richard





--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Jeremy Harris wrote:

On 12/24/2009 05:12 PM, Richard Neill wrote:

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card.


Could you expand on that?


Both of the last machines I bought (an IBM X3550 and an HP DL380) come 
with hardware raid solutions. These are an utter nuisance because:


  - they can only be configured from the BIOS (or with a
bootable utility CD). Linux has very basic monitoring tools,
but no way to reconfigure the array, or add disks to empty
hot-swap slots while the system is running.

  - If there is a Linux raid config program, it's not part of the
main packaged distro, but usually a pre-built binary, available
for only one release/kernel of the wrong distro.

  - the IBM one had dodgy firmware, which, until updated, caused the
disk to totally fail after a few days.

  - you pay a lot of money for something effectively pointless, and
have less control and less flexibility.

After my experience with the X3550, I hunted for any server that would 
ship without hardware raid, i.e. connect the 8 SATA hotswap slots direct 
to the motherboard, or where the hardware raid could be de-activated 
completely, and put into pass-through mode. Neither HP nor IBM make such 
a thing.


Richard






- Jeremy



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Adam Tauno Williams wrote:

This isn't true.  IBMs IPS series controllers can the checked and configured 
via the ipssend utility that works very well in 2.6.x LINUX.



Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl 
card. This one is atrocious - it shipped with a hideous firmware bug. 
And there is no way to bypass it.


The HP have the P400 cards, which are decent in themselves, just not as 
good as software raid.


Richard



"Scott Marlowe"  wrote:


On Thu, Dec 24, 2009 at 11:09 AM, Richard Neill  wrote:


Jeremy Harris wrote:

On 12/24/2009 05:12 PM, Richard Neill wrote:

Of course, with a server machine, it's nearly impossible to use mdadm
raid: you are usually compelled to use a hardware raid card.

Could you expand on that?

Both of the last machines I bought (an IBM X3550 and an HP DL380) come with
hardware raid solutions. These are an utter nuisance because:

 - they can only be configured from the BIOS (or with a
   bootable utility CD). Linux has very basic monitoring tools,
   but no way to reconfigure the array, or add disks to empty
   hot-swap slots while the system is running.

 - If there is a Linux raid config program, it's not part of the
   main packaged distro, but usually a pre-built binary, available
   for only one release/kernel of the wrong distro.

 - the IBM one had dodgy firmware, which, until updated, caused the
   disk to totally fail after a few days.

 - you pay a lot of money for something effectively pointless, and
   have less control and less flexibility.

After my experience with the X3550, I hunted for any server that would ship
without hardware raid, i.e. connect the 8 SATA hotswap slots direct to the
motherboard, or where the hardware raid could be de-activated completely,
and put into pass-through mode. Neither HP nor IBM make such a thing.

Yep.  And that's why I never order servers from them.  There are
dozens of reputable white box builders (I use Aberdeen who give me a 5
year all parts warranty and incredible customer service, but there are
plenty to choose from) and they build the machine I ask them to build.
For hardware RAID I use Areca 1680 series, and they also provide me
with machines with software RAID for lighter loads (slave dbs,
reporting dbs, and stats dbs)

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


--
Message composed using K-9 mail on Android.
Apologies for improper reply quoting (not supported) by client.


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SATA drives performance

2009-12-24 Thread Richard Neill



Scott Marlowe wrote:

On Thu, Dec 24, 2009 at 3:51 PM, Richard Neill  wrote:


Adam Tauno Williams wrote:

This isn't true.  IBMs IPS series controllers can the checked and
configured via the ipssend utility that works very well in 2.6.x LINUX.


Unfortunately, what we got (in the IBM) was the garbage ServeRaid 8kl card.
This one is atrocious - it shipped with a hideous firmware bug. And there is
no way to bypass it.


Can you replace the IBM RAID controller with some other controller?
Even just a simple 4 or 8 port SATA card with no RAID capability would
be better than something that locks up.


A replacement would have been nice, however the 8kl is very tightly 
integrated with the motherboard and the backplane. We'd have had to buy 
a PCI-X card, and then get out the soldering iron to fix the cables.


To be fair, the 8kl is now working OK; also there was a note in the box 
mentioning that firmware updates should be applied if available. What I 
found unbelievable was that IBM shipped the server to me in a state with 
known crashing firmware (a sufficiently bad bug imho to merit a product 
recall), and hadn't bothered to flash it themselves in the factory. 
Usually BIOS updates are only applied by the end user if there is a 
specific issue to fix, and if the product line has been out for years, 
but that particular server was only assembled 3 weeks ago, why would one 
expect a company of IBM's standing to ship it in that state.


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Richard Neill


Yup, I even tried manually running 'VACUUM FULL' and it didn't help. As 
for upgrading;


VACUUM FULL is usually considered a bad idea. What you probably want to 
do instead is CLUSTER, followed by ANALYZE.


Basically, VACUUM makes the indexes smaller (but doesn't reclaim much 
space from the tables themselves). VACUUM FULL reclaims space from the 
tables, but bloats the indexes.


a) I am trying to find a way around the dump/reload. I am doing it as a 
"last resort" only.

b) I want to keep the version in CentOS' repo.



Postgres is pretty easy to build from source. It's nicely 
self-contained, and won't bite you with dependency hell. So don't be too 
wary of compiling it.


Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] noob inheritance question

2010-01-06 Thread Richard Neill

Zintrigue wrote:

I'm hoping the inheritance feature will be a nice alternative method for 
me to implement categories in particular database of products I need to 
keep updated. I suppose in MySQL I would probably do this by creating, 
for example, one table for the products, and then a table(s) for 
categories, and then I'd be faced with a choice between using an 
adjacency list or nested set paradigm for, say, breadcrumb links in my 
private web app.


On the other hand, in Postgres what I'd like to do it just create an 
empty root "product" table, then create, for example, a "spirts" table 
that inherits from products, and "rums" table that inherits from 
spirits, and then "aged rum", "flavored rum", et al, which inherit from 
rums.


In this scenario, my idea was to have all my fields in "products" and to 
not add any additional fields in the child tables. Also, only the lowest 
level of child tables in any given branch of products would actually 
contain data / rows.


Assuming this is a good design,


May I venture to stop you there. This sounds like you are doing it
The Hard Way.

In particular, each time you add a new category, you're going to have to 
add a new database table, and your schema is going to get to be 
horrible. Inserts aren't going to be much fun either.


Rather than adding multiple child tables, may I suggest some other way 
of tracking which item is a subset of the other.

 You could do it by having 2 columns:
id, parent_id  (each integer and indexed)
or you could do it by having 2 columns:
id, list   (id is integer, list is eg "1,3,5,13")
(where the list is a comma-separated list, or an array, and holds the 
full path)



Depending on scale, you may be able to choose a simple algorithm instead 
of hunting for the most efficient one.



Best wishes,

Richard


P.S. This is the performance mailing list - you may find one of the 
other lists better suited to your questions.


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Joint index including MAX() ?

2010-01-09 Thread Richard Neill

Dear All,

I'm trying to optimise the speed of some selects with the where condition:

WHERE id =
 (SELECT MAX(id) FROM tbl_sort_report WHERE parcel_id_code='43024')


This is relatively slow, taking about 15-20ms, even though I have a 
joint index on both fields:


CREATE INDEX testidx3 ON tbl_sort_report (id, parcel_id_code);


So, my question is, is there any way to improve this? I'd expect that an 
index on   ( max(id),parcel_id_code ) would be ideal, excepting that 
postgres won't allow that (and such an index probably doesn't make much 
conceptual sense).



Explain Analyze is below.

Thanks,

Richard



Here is part of the schema. id is the primary key; parcel_id_code loops 
from 0...9 and back again every few hours.


fsc_log=> \d tbl_sort_report
Table "public.tbl_sort_report"
Column|   Type   | 
 Modifiers

--+--+-
 id   | bigint   | not null default 
nextval('master_id_seq'::regclass)

 timestamp| timestamp with time zone |
 parcel_id_code   | integer  |
(etc)




EXPLAIN ANALYZE (SELECT MAX(id) FROM tbl_sort_report WHERE 
parcel_id_code='43024');


QUERY PLAN

 Result  (cost=7.34..7.35 rows=1 width=0) (actual time=17.712..17.714 
rows=1 loops=1)

   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.00..7.34 rows=1 width=8) (actual 
time=17.705..17.705 rows=0 loops=1)
   ->  Index Scan Backward using testidx3 on tbl_sort_report 
(cost=0.00..14.67 rows=2 width=8) (actual time=17.700..17.700 rows=0 
loops=1)

 Index Cond: (parcel_id_code = 43024)
 Filter: (id IS NOT NULL)
 Total runtime: 17.786 ms


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fragmentation/Vacuum, Analyze, Re-Index

2010-01-22 Thread Richard Neill

DM wrote:
Is there any script/tool to identify if the table requires full vacuum? 
or to re-index an existing index table?




Don't know if there is a script to specifically do this, though you may 
find this query a useful one:


SELECT relname, reltuples, relpages FROM pg_class ORDER BY relpages DESC;


(it shows what's currently using most of the disk).


In general though, you should never use "VACUUM FULL". The best bet is 
to tune autovacuum to be more aggressive, and then occasionally run CLUSTER.


Best wishes,

Richard




Thanks
Deepak



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Dear All,

Just wondering whether there is a missing scope for the query planner 
(on 8.4.2) to be cleverer than it currently is.


Specifically, I wonder whether the optimiser should know that by 
converting a CASE condition into a WHERE condition, it can use an index.


Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard



Example:


In this example, tbl_tracker has 255751 rows, with a primary key "id", 
whose values lie uniformly in the range 1...1255750.


If one is trying to count multiple conditions, the following query seems 
to be the most obvious way to do it:


SELECT
  SUM (case when id > 120 and id < 121 then 1 else 0 end) AS c1,
  SUM (case when id > 121 and id < 122 then 1 else 0 end) AS c2,
  SUM (case when id > 122 and id < 123 then 1 else 0 end) AS c3,
  SUM (case when id > 123 and id < 124 then 1 else 0 end) AS c4,
  SUM (case when id > 124 and id < 125 then 1 else 0 end) AS c5
FROM tbl_tracker;


  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms



This can be manually optimised into a far uglier (but much much faster) 
query:


SELECT * FROM
 (SELECT COUNT (1) AS c1 FROM tbl_tracker
WHERE id > 120 and id < 121) AS s1,
 (SELECT COUNT (1) AS c2 FROM tbl_tracker
WHERE id > 121 and id < 122) AS s2,
 (SELECT COUNT (1) AS c3 FROM tbl_tracker
WHERE id > 122 and id < 123) AS s3,
 (SELECT COUNT (1) AS c4 FROM tbl_tracker
WHERE id > 123 and id < 124) AS s4,
 (SELECT COUNT (1) AS c5 FROM tbl_tracker
WHERE id > 124 and id < 125) AS s5

  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms





Debugging
-

The simple queries are:

SELECT SUM (case when id > 120 and id < 121 then 1 else 0 end) 
from tbl_tracker;


Time: 174.804 ms

Explain shows that this does a sequential scan.



SELECT COUNT(1) from tbl_tracker WHERE id > 120 and id < 121;

Time: 4.153 ms

Explain shows that this uses the index, as expected.



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Thanks for your answers.


David Wilson wrote:

> Why not simply add the where clause to the original query?
>
> SELECT
> SUM (case when id > 120 and id < 121 then 1 else 0 end) AS c1,
> SUM (case when id > 121 and id < 122 then 1 else 0 end) AS c2,
> SUM (case when id > 122 and id < 123 then 1 else 0 end) AS c3,
> SUM (case when id > 123 and id < 124 then 1 else 0 end) AS c4,
> SUM (case when id > 124 and id < 125 then 1 else 0 end) AS c5
> FROM tbl_tracker WHERE (id>120) AND (id<125);
>
> I didn't populate any test tables, but I'd expect that to do just as
> well without being any uglier than the original query is.

You're absolutely right, but I'm afraid this won't help. I'd simplified 
the original example query, but in real life, I've got about 50 
different sub-ranges, which cover virtually all the id-space.


--

Tom Lane wrote:

Richard Neill  writes:

SELECT
   SUM (case when id > 120 and id < 121 then 1 else 0 end) AS c1,
   SUM (case when id > 121 and id < 122 then 1 else 0 end) AS c2,
   ...
FROM tbl_tracker;


This can be manually optimised into a far uglier (but much much faster) 
query:



SELECT * FROM
  (SELECT COUNT (1) AS c1 FROM tbl_tracker
 WHERE id > 120 and id < 121) AS s1,
  (SELECT COUNT (1) AS c2 FROM tbl_tracker
 WHERE id > 121 and id < 122) AS s2,
  ...


We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.



OK - that's all I was wondering. I thought I'd raise this in case it 
might be helpful.


I'll add a note to:
http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html
to point out that this is something of a trap for the unwary

Regards,

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Limited Shared Buffer Problem

2010-01-29 Thread Richard Neill



**Rod MacNeil wrote:

Hi All,

I have a server running CentOS5 with 6gb of memory that will run 
postgres 8.3 exclusively.

I would like to allocate 4gb of the memory to shared buffers for postgres.


It might be worth pausing at this point:

The various postgresql tuning guides usually suggest that on a dedicated 
system, you should give postgres about 1/4 of the RAM for shared 
buffers, while telling it that the effective_cache_size = 1/2 RAM.


Postgres will make good use of the OS cache as a file-cache - the 
"effective_cache_size" setting is advisory to postgres that it can 
expect about this much data to be in RAM.


Also, If you are setting up a new system, it's probably worth going for 
8.4.2. Postgres is relatively easy to build from source.


HTH,

Richard

--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Richard Neill

Dear All,

I've just joined this list, and I'd like to request some advice.

I have a table (1 GB in size) with 24 columns, and 5.6 million rows. Of 
these, we're interested in two columns, parcel_id_code, and exit_state.


parcel_id_code has a fairly uniform distribution of integers
from 1-9, it's never null.

exit_state has 3 possible values, 1,2 and null.
Almost all the rows are 1, about 0.1% have the value 2, and
only 153 rows are null


The query I'm trying to optimise looks like this:

SELECT * from  tbl_tracker
WHERE parcel_id_code='53030' AND exit_state IS NULL;

So, I have a partial index:

"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE
exit_state IS NULL

which works fine if it's the only index.


BUT, for other queries (unrelated to this question), I also have to have 
full indexes on these columns:


"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)


The problem is, when I now run my query, the planner ignores the 
dedicated index "tbl_tracker_performance_1_idx", and instead uses both 
of the full indexes... resulting in a much much slower query (9ms vs 
0.08ms).


A psql session is below.  This shows that, if I force the planner to use 
the partial index, by dropping the others, then it's fast. But as soon 
as I put the full indexes back (which I need for other queries), the 
query planner chooses them instead, and is slow.



Thanks very much for your help,

Richard










fsc_log => \d tbl_tracker

   Column|   Type   |   Modifiers
-+--+--
 id  | bigint   | not null default 
nextval('master_id_seq'::regclass)

 dreq_timestamp_1| timestamp with time zone |
 barcode_1   | character varying(13)|
 barcode_2   | character varying(13)|
 barcode_best| character varying(13)|
 entrance_point  | character varying(13)|
 induct  | character varying(5) |
 entrance_state_x| integer  |
 dreq_count  | integer  |
 parcel_id_code  | integer  |
 host_id_code| bigint   |
 original_dest   | integer  |
 drep_timestamp_n| timestamp with time zone |
 actual_dest | integer  |
 exit_state  | integer  |
 chute   | integer  |
 original_dest_state | integer  |
 srep_timestamp  | timestamp with time zone |
 asn | character varying(9) |
 is_asn_token| boolean  |
 track_state | integer  |
 warning | boolean  |
Indexes:
"tbl_tracker_pkey" PRIMARY KEY, btree (id) CLUSTER
"tbl_tracker_barcode_best_idx" btree (barcode_best)
"tbl_tracker_chute_idx" btree (chute)
"tbl_tracker_drep_timestamp_n_idx" btree (drep_timestamp_n) WHERE 
drep_timestamp_n IS NOT NULL
"tbl_tracker_dreq_timestamp_1_idx" btree (dreq_timestamp_1) WHERE 
dreq_timestamp_1 IS NOT NULL

"tbl_tracker_exit_state_idx" btree (exit_state)
"tbl_tracker_parcel_id_code_idx" btree (parcel_id_code)
"tbl_tracker_performance_1_idx" btree (parcel_id_code) WHERE 
exit_state IS NULL

"tbl_tracker_performance_2_idx" btree (host_id_code, id)
"tbl_tracker_performance_3_idx" btree (srep_timestamp) WHERE 
exit_state = 1 AND srep_timestamp IS NOT NULL
"tbl_tracker_srep_timestamp_idx" btree (srep_timestamp) WHERE 
srep_timestamp IS NOT NULL





fsc_log=> explain analyse select * from  tbl_tracker where 
parcel_id_code='53030' AND exit_state IS NULL;


QUERY  PLAN
---
 Bitmap Heap Scan on tbl_tracker  (cost=8.32..10.84 rows=1 width=174) 
(actual time=9.334..9.334 rows=0 loops=1)

   Recheck Cond: ((parcel_id_code = 53030) AND (exit_state IS NULL))
   ->  BitmapAnd  (cost=8.32..8.32 rows=1 width=0) (actual 
time=9.329..9.329 rows=0 loops=1)
 ->  Bitmap Index Scan on tbl_tracker_parcel_id_code_idx 
(cost=0.00..3.67 rows=57 width=0) (actual time=0.026..0.026 rows=65 loops=1)

   Index Cond: (parcel_id_code = 53030)
 ->  Bitmap Index Scan on tbl_tracker_exit_state_idx 
(cost=0.00..4.40 rows=150 width=0) (actual time=9.289..9.289 rows=93744 
loops=1)

   Index Cond: (exit_state IS NULL)
 Total runtime: 9.366 ms
(8 rows)



fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX

fsc_log=> explain analyse select * from  tbl_tracker where 
parcel_id_code='53030' AND exit_state IS NULL;


QUERY  PLAN

 Bitmap Heap Scan on tbl_tracker  (cost=3.67..145.16 rows=1 width=174) 
(actual time=0.646..0.646 rows=0 loops=1)

  

Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Richard Neill



On 19/12/12 22:59, Sergey Konoplev wrote:


On Wed, Dec 19, 2012 at 1:13 PM, Richard Neill  wrote:

  Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
(cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
loops=1)
Index Cond: (parcel_id_code = 53030)


It looks like your index is bloated. Have you had a lot of
updates/deletes on rows with exit_state is null?

Try to reindex tbl_tracker_performance_1_idx.

To reindex it without locks create a new index with temporary name
concurrently, delete the old one and rename the new one using the old
name.



Hi Sergey,

Thanks for your suggestion. Yes, I can see what you mean: over the 3 
weeks during which we deployed the system, every single row has at one 
point had the exit_state as null, before being updated.


Essentially, as time moves on, new rows are added, initially with 
exit_state null, then a few minutes later we update them to exit_state 
1, then a few weeks later, they are removed.


[Explanation: the system tracks books around a physical sortation 
machine; the sorter uses a "parcel_id_code" which (for some really daft 
reason suffers wraparound at 9, i.e. about every 3 hours), books 
whose exit_state is null are those which are still on the sortation 
machine; once they exit, the state is either 1 (successful delivery) or 
2 (collision, and down the dump chute).]


BUT

* The reindex solution doesn't work. I just tried it, and the query 
planner is still using the wrong indexes.


* If the tbl_tracker_performance_1_idx had indeed become bloated, 
wouldn't that have meant that when the query planner was forced to use 
it (by deleting the alternative indexes), it would have been slow?


Also, I thought that reindex wasn't supposed to be needed in normal 
operation.


Best wishes,

Richard




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Richard Neill

Thanks for your help,

On 20/12/12 00:08, Sergey Konoplev wrote:

On Wed, Dec 19, 2012 at 3:49 PM, Richard Neill  wrote:

* The reindex solution doesn't work. I just tried it, and the query planner
is still using the wrong indexes.


Can you show the explain analyze with tbl_tracker_performance_1_idx
straight after reindex (eg. before it has been bloated again)?


Sure. Just done it now... the system has been fairly lightly loaded for 
the last few hours - though I did have to change the specific number of 
the parcel_id_code in the query.




fsc_log=> explain analyse select * from tbl_tracker where 
parcel_id_code=92223 and exit_state is null;


QUERY PLAN
---
 Index Scan using tbl_tracker_exit_state_idx on tbl_tracker 
(cost=0.00..6.34 rows=1 width=174) (actual time=0.321..1.871 rows=1 loops=1)

   Index Cond: (exit_state IS NULL)
   Filter: (parcel_id_code = 92223)
 Total runtime: 1.905 ms
(4 rows)



And now, force it, by dropping the other index (temporarily):

fsc_log=> drop index tbl_tracker_exit_state_idx;
DROP INDEX


fsc_log=> explain analyse select * from tbl_tracker where 
parcel_id_code=92223 and exit_state is null;


QUERY PLAN
-
 Index Scan using tbl_tracker_performance_1_idx on tbl_tracker 
(cost=0.00..7.78 rows=1 width=174) (actual time=0.040..0.041 rows=1 loops=1)

   Index Cond: (parcel_id_code = 92223)
 Total runtime: 0.077 ms
(3 rows)



As far as I can tell, the query planner really is just getting it wrong.

BTW, there is a significant effect on speed caused by running the same 
query twice (it pulls stuff from disk into the OS disk-cache), but I've 
already accounted for this.



Richard



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Richard Neill

Dear Tom,

Thanks very much for your advice.


A psql session is below.  This shows that, if I force the planner to use
the partial index, by dropping the others, then it's fast. But as soon
as I put the full indexes back (which I need for other queries), the
query planner chooses them instead, and is slow.


[ experiments with a similar test case ... ]  I think the reason why the
planner is overestimating the cost of using the partial index is that
9.1 and earlier fail to account for the partial-index predicate when
estimating the number of index rows that will be visited.  Because the
partial-index predicate is so highly selective in this case, that
results in a significant overestimate of how much of the index will be
traversed.


I think that seems likely to me.

I'll try out 9.2 and see if it helps. As it's a production server, I 
have to wait for some downtime, probably Friday night before I can find 
out - will report back.


Best wishes,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Richard Neill

Dear Jeff,

Thanks for your help,


* The reindex solution doesn't work. I just tried it, and
the query planner
is still using the wrong indexes.


It switched to a better one of the wrong indices, though, and got
several times faster.



I think that this is a red herring. The switching between the two 
"wrong" indices seems to be caused by non-uniformity in the 
parcel_id_code: although it's distributed fairly well across 1-9, 
it's not perfect.


As for the speed-up, I think that's mostly caused by the fact that 
running "Analyse" is pulling the entire table (and the relevant index) 
into RAM and flushing other things out of that cache.



How did it get so bloated in the first place?  Is the table being
updated so rapidly that the statistics might be wrong even immediately
after analyze finishes?


I don't think it is. We're doing about 10 inserts and 20 updates per 
second on that table. But when I tested it, production had stopped for 
the night - so the system was quiescent between the analyse and the select.



In any case, I can't get it to prefer the full index in 9.1.6 at all.
  The partial index wins hands down unless the table is physically
clustered by the parcel_id_code column.  In which that case, the partial
index wins by only a little bit.


Interesting that you should say that... the original setup script did 
choose to cluster the table on that column.


Also, I wonder whether it matters which order the indexes are created in?


Best wishes,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-20 Thread Richard Neill

Dear Tom,

Thanks againg for your help on this.

On 20/12/12 03:06, Tom Lane wrote:

Richard Neill  writes:

The problem is, when I now run my query, the planner ignores the
dedicated index "tbl_tracker_performance_1_idx", and instead uses both
of the full indexes... resulting in a much much slower query (9ms vs
0.08ms).




I've now installed 9.2. As you said, thanks to the change in 9.2 it 
initially prefers the partial index.


BUT, after 1 cycle of inserting 500k rows, then deleting them all, then 
starting to insert again, I find that the planner has reverted to the 
former bad behaviour.


Reindexing only takes a couple of seconds, and restores correctness.

What's going on? Do I need to run reindex in a cron-job? I thought that 
reindex wasn't "normally" needed, and that index bloat happened only 
after every row had changed value hundreds of times.


Thanks,

Richard


-
Here's the same session again.

[Please ignore the dreq_1_timestamp check - I mistakenly failed to 
simplify it out of the query, and now that I reindexed, I can't redo the 
experiment. I don't think it makes any difference.]



fsc_log=> explain analyse select * from tbl_tracker WHERE
parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 > 
timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours');


QUERY PLAN
---
 Bitmap Heap Scan on tbl_tracker  (cost=17.35..19.86 rows=1 width=174) 
(actual time=8.056..8.056 rows=0 loops=1)

   Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 90820))
   Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp 
without time zone)
   ->  BitmapAnd  (cost=17.35..17.35 rows=1 width=0) (actual 
time=8.053..8.053 rows=0 loops=1)
 ->  Bitmap Index Scan on tbl_tracker_exit_state_idx 
(cost=0.00..8.36 rows=151 width=0) (actual time=7.946..7.946 rows=20277 
loops=1)

   Index Cond: (exit_state IS NULL)
 ->  Bitmap Index Scan on tbl_tracker_parcel_id_code_idx 
(cost=0.00..8.73 rows=58 width=0) (actual time=0.025..0.025 rows=72 loops=1)

   Index Cond: (parcel_id_code = 90820)
 Total runtime: 8.090 ms
(9 rows)


fsc_log=> REINDEX index tbl_tracker_performance_1_idx;
#This only took a couple of seconds to do.

fsc_log=> explain analyse select * from tbl_tracker WHERE 
parcel_id_code='90820' AND exit_state IS NULL AND (dreq_timestamp_1 > 
timestamp '2012-12-20 13:02:36.652' - INTERVAL '36 hours');


QUERY PLAN
---


 Index Scan using tbl_tracker_performance_1_idx on tbl_tracker 
(cost=0.00..5.27 rows=1 width=174) (actual time=0.019..0.019 rows=0 loops=1)

   Index Cond: (parcel_id_code = 90820)
   Filter: (dreq_timestamp_1 > '2012-12-19 01:02:36.652'::timestamp 
without time zone)

 Total runtime: 0.047 ms
(4 rows)





--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-20 Thread Richard Neill



On 21/12/12 02:34, Richard Neill wrote:


Reindexing only takes a couple of seconds, and restores correctness.



Interestingly, the partial index (after reindexing) is only 16kB in 
size; whereas the table is 1.1 GB, and the normal single-column indexes 
are about 250MB in size.


In terms of what's physically happening in reality,

- tbl_tracker keeps a record of all books that move through the system
  over a period of one month (at a rate of about 20/second, or 1
  million/day), after which they are deleted.

- the partial index, tbl_tracker_performance_1_idx tracks only those
  books which are currently "in flight" - books remain in flight for
  about 200 seconds as they go round the machine.
  (While in flight, these have exit_state = NULL)

- the partial index is used to overcome a design defect(*) in the
  sorter machine, namely that it doesn't number each book uniquely,
  but wraps the parcel_id_code every few hours. Worse, some books can
  remain on the sorter for several days (if they jam), so the numbering
  isn't a clean "wraparound", but more like a fragmented (and
  occasionally lossy) filesystem.

- What I'm trying to do is trace the history of the books
  through the system and assign each one a proper unique id.
  So, if I see a book with "parcel_id_code = 37",
  is it a new book (after pid wrap), or is it the same book I saw 1
  minute ago, that hasn't exited the sorter?


So... is there some way to, for example, set a trigger that will reindex 
every time the index exceeds 1000 rows?



Richard



(*)Readers of The Daily WTF might appreciate another curious anomaly: 
this machine originally had an RS-232 port; it now uses ethernet, but 
TxD and RxD use different TCP sockets on different network ports!



--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-22 Thread Richard Neill



I've now installed 9.2. As you said, thanks to the change in 9.2 it
initially prefers the partial index.

BUT, after 1 cycle of inserting 500k rows, then deleting them all,
then starting to insert again, I find that the planner has reverted
to the former bad behaviour.


Presumably the real work load has this type of turn over happen one row
at a time, rather than all in one giant mass update transaction, right?
  That makes a big difference in the way space is re-used.


Sorry - I meant a "real" workload here. I replayed a whole day's worth 
of real data into the DB, and that's what I meant by a cycle. Everything 
was row-at-a-time.

(It currently takes about an hour to do this)



Reindexing only takes a couple of seconds, and restores correctness.


Even your slow query is pretty fast.  If you can't afford that, can you
afford to take an exclusive lock for a couple of seconds every few minutes?


Yes, I can. If that's the root cause, I'll do that. But it seems to me 
that I've stumbled upon some rather awkward behaviour that I need to 
understand fully, and if the index is bloating that badly and that 
quickly, then perhaps it's a PG bug (or at least cause for a logfile 
warning).


BTW, The index has gone from 16kB to 4.5MB in 6 hours of runtime today. 
It still only has 252 matching rows.




What's going on? Do I need to run reindex in a cron-job? I thought
that reindex wasn't "normally" needed, and that index bloat happened
only after every row had changed value hundreds of times.


The partial index is highly leveraged.  If every tuple in the table is
updated once, that amounts to every tuple in the index being updated
25,000 times.


How so? That sounds like O(n_2) behaviour.




For the same reason, it is probably not getting vacuum often enough.
  The default settings have the table vacuumed once 20% of its rows
turns over, but that means the partial index has been turned over many
many times.  You could crank down the auto-vacuum settings for that
table, or run manual vacuum with a cron job.

Vacuum will not unbloat the index, but if you run it often enough it
will keep the bloat from getting too bad in the first place.


Thanks. I've reduced  autovacuum_vacuum_scale_factor from 0.2 to 0.05
(and set autovacuum_analyze_scale_factor = 0.05 for good measure)

As I understand it, both of these can run in parallel, and I have 7 
cores usually idle, while the other is maxed out.



But what I think I'd do is change one of your full indexes to contain
the other column as well, and get rid of the partial index.  It might
not be quite as efficient as the partial index might theoretically be,
but it should be pretty good and also be less fragile.


I'll try that.

Thanks,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-22 Thread Richard Neill



On 21/12/12 05:15, Jeff Janes wrote:



- What I'm trying to do is trace the history of the books
   through the system and assign each one a proper unique id.
   So, if I see a book with "parcel_id_code = 37",
   is it a new book (after pid wrap), or is it the same book I saw 1
   minute ago, that hasn't exited the sorter?

I'm not sure how you are implementing this goal, but I don't think it is
best done by looping over all books (presumably from some other table?)
and issuing an individual query for each one, if that is what you are
doing.  Some kind of bulk join would probably be more efficient.


It would be nice to do a bulk join, but it's not possible: the query is 
time sensitive. Consider:


id/pkey pid  timestamp   exit_state destination

1   77  -24 hours   1   212
2   77  -18 hours   1   213
3   77  -12 hours   1   45
4   77  -6 hours1   443
5   77  0 hours null

[in future...]
5   77  0 hours 1   92
6   77  4 hours null


At time +5 minutes, I receive a report that a book with parcel_id 77 has 
successfully been delivered to destination 92.  So, what I have to do is:


* First, find the id of the most recent book which had pid=77 and where 
the exit state is null. (hopefully, but not always, this yields exactly 
one row, which in this case is id=5)


* Then update the table to set the destination to 92, where the id=5.


It's a rather cursed query, because:
 - the sorter machine doesn't give me full info in each message, only
   deltas, and I have to reconstruct the global state.
 - pids are reused within hours, but don't increase monotonically,
   (more like drawing repeatedly from a shuffled deck, where cards
   are only returned to the deck sporadically.
 - some pids get double-reported
 - 1% of books fall off the machine, or get stuck on it.
 - occasionally, messages are lost.
 - the sorter state isn't self-consistent (it can be restarted)


The tracker table is my attempt to consistently combine all the state we 
know, and merge in the deltas as we receive messages from the sorter 
machine. It ends up reflecting reality about 99% of the time.



Richard






--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists? (solved?)

2012-12-24 Thread Richard Neill

Dear All,

I think periodic reindex may be the solution. Even after reducing the 
autovacuum fraction to 0.05, the index still seems to bloat.


After another couple of days runtime, the index is using 11MB, and
I get a query that takes 2.448ms. Then I reindex (takes about 3 sec), 
and the index falls to 16kB, and the query takes 0.035ms.


So... problem solved for me: I just have to reindex every few hours.
BUT, this suggests a few remaining things:


1. The documentation still suggests that reindex should not be needed in 
"normal" operation...  is this true? Or are the docs wrong? Or have I 
got such an edge case? Does this suggest that an auto-reindexer would be 
a useful feature?



2. Is there any way to force the planner to use (or ignore) a specific 
index, for testing purposes, short of actually dropping the index?
This would be very useful for debugging, especially given that query 
plans can only really be fully tested on production systems, and that 
dropping indexes is rather a bad thing to do when live operation is 
simultaneously happening on that server!


Thanks again for your help.

Best wishes,

Richard





fsc_log=> explain analyse select * from tbl_tracker WHERE 
parcel_id_code='32453' AND exit_state IS NULL;


QUERY PLAN
--
 Bitmap Heap Scan on tbl_tracker  (cost=20.81..23.32 rows=1 width=174) 
(actual time=2.408..2.408 rows=1 loops=1)

   Recheck Cond: ((exit_state IS NULL) AND (parcel_id_code = 32453))
   ->  BitmapAnd  (cost=20.81..20.81 rows=1 width=0) (actual 
time=2.403..2.403 rows=0 loops=1)
 ->  Bitmap Index Scan on tbl_tracker_exit_state_idx 
(cost=0.00..9.25 rows=132 width=0) (actual time=2.378..2.378 rows=5 loops=1)

   Index Cond: (exit_state IS NULL)
 ->  Bitmap Index Scan on tbl_tracker_parcel_id_code_idx 
(cost=0.00..11.30 rows=62 width=0) (actual time=0.022..0.022 rows=65 
loops=1)

   Index Cond: (parcel_id_code = 32453)
 Total runtime: 2.448 ms


fsc_log => REINDEX;


fsc_log=> explain analyse select * from tbl_tracker WHERE 
parcel_id_code='32453' AND exit_state IS NULL;


QUERY PLAN
-
Index Scan using tbl_tracker_performance_1_idx on tbl_tracker 
(cost=0.00..5.27 rows=1 width=174) (actual time=0.007..0.008 rows=1 loops=1)

   Index Cond: (parcel_id_code = 32453)
 Total runtime: 0.035 ms
(3 rows)









--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-27 Thread Richard Neill



The partial index is highly leveraged.  If every tuple in the
table is updated once, that amounts to every tuple in the index
being updated 25,000 times.

How so? That sounds like O(n_2) behaviour.

If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table.  But each such change would
lead to an addition and then a deletion from the index.  So 100%
turnover of the table would be a 5 million / 200 = 25,000 fold turn of
the index.


Sorry, I was being dense. I misread that as:
   "every time a single tuple in the table is updated, the entire index
(every row) is updated".
Yes, of course your explanation makes sense.



There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry
and finds the corresponding tuple in the table no longer visible to
anyone.  I have not examined this code, and don't know whether it is
doing its job but just isn't enough to prevent the bloat, or if for some
reason it is not applicable to your situation.



It looks like my solution is going to be a REINDEX invoked from cron, or 
maybe just every 100k inserts.



In terms of trying to improve this behaviour for other PG users in the 
future, are there any more diagnostics I can do for you? Having found a 
special case, I'd like to help permanently resolve it if I can.



Thanks very much again.

Best wishes,

Richard







--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-27 Thread Richard Neill



On 27/12/12 16:17, Jeff Janes wrote:


I still think your best bet is to get rid of the partial index and trade
the full one on (parcel_id_code) for one on (parcel_id_code,exit_state).
  I think that will be much less fragile than reindexing in a cron job.



So, at the moment, I have 3 indexes:
  full: parcel_id_code
  full: exit_state
  full: parcel_id_code where exit state is null

Am I right that when you suggest just a single, joint index
(parcel_id_code,exit_state)
instead of all 3 of the others,

it will allow me to optimally run all of the following? :

1.  SELECT * from tbl_tracker WHERE parcel_id_code=22345 AND exit_state 
IS NULL


(this is the one we've been discussing)


2.  SELECT * from tbl_tracker where parcel_id_code=44533

3.  SELECT * from tbl_tracker where exit_code = 2

(2 and 3 are examples of queries I need to run for other purposes, 
unrelated to this thread, but which use the other indexes.).



Thanks,

Richard




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] explain analyze reports that my queries are fast but they run very slowly

2012-12-27 Thread Richard Neill

On 27/12/12 17:21, François Beausoleil wrote:


Le 2012-12-27 à 12:10, Nikolas Everett a écrit :


We just upgraded from 8.3 to 9.1 and we're seeing some performance problems.  
When we EXPLAIN ANALYZE our queries the explain result claim that the queries 
are reasonably fast but the wall clock time is way way longer.  Does anyone 
know why this might happen?



Is it possible you missed an optimisation setting in the migration 
process? I made that mistake, and much later found performance was 
somewhat degraded (but surprisingly not as much as I'd expected) by my 
having failed to set effective_cache_size.


Also, if you just did a dump/restore, it might help to run Analyse once
(it seems that Analyse is normally run automatically via vacuum, but the 
first time you insert the data, it may not happen).


A side-effect of Analyse it that it will pull all the tables into the OS 
memory cache (or try to) - which may give significantly faster results 
(try running the same query twice in succession: it's often 5x faster 
the 2nd time).


HTH,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-27 Thread Richard Neill



The partial index is highly leveraged.  If every tuple in the
table is updated once, that amounts to every tuple in the index

> being updated 25,000 times.


How so? That sounds like O(n_2) behaviour.

If the table has 5 million rows while the index has 200 (active) rows at
any given time, then to update every row in the table to null and back
again would be 100% turn over of the table.  But each such change would
lead to an addition and then a deletion from the index.  So 100%
turnover of the table would be a 5 million / 200 = 25,000 fold turn of
the index.


Sorry, I was being dense. I misread that as:
   "every time a single tuple in the table is updated, the entire index
(every row) is updated".
Yes, of course your explanation makes sense.



There is some code that allows a btree index entry to get killed (and so
the slot to be reused) without any vacuum, if a scan follows that entry
and finds the corresponding tuple in the table no longer visible to
anyone.  I have not examined this code, and don't know whether it is
doing its job but just isn't enough to prevent the bloat, or if for some
reason it is not applicable to your situation.



It looks like my solution is going to be a REINDEX invoked from cron, or 
maybe just every 100k inserts.



In terms of trying to improve this behaviour for other PG users in the 
future, are there any more diagnostics I can do for you? Having found a 
special case, I'd like to help permanently resolve it if I can.



Thanks very much again.

Best wishes,

Richard







--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two Necessary Kernel Tweaks for Linux Systems

2013-01-02 Thread Richard Neill

Dear Shaun,

Thanks for that - it's really interesting to know.

On 02/01/13 21:46, Shaun Thomas wrote:

Hey everyone!

After much testing and hair-pulling, we've confirmed two kernel
settings that should always be modified in production Linux systems.
Especially new ones with the completely fair scheduler (CFS) as
opposed to the O(1) scheduler.


Does it apply to all types of production system, or just to certain 
workloads?


For example, what happens when there are only one or two concurrent
processes?  (i.e. there are always several more CPU cores than there are
actual connections).



* sched_autogroup_enabled

This is a relatively new patch which Linus lauded back in late 2010.
It basically groups tasks by TTY so perceived responsiveness is
improved. But on server systems, large daemons like PostgreSQL are
going to be launched from the same pseudo-TTY, and be effectively
choked out of CPU cycles in favor of less important tasks.



I've got several production servers using Postgres: I'd like to squeeze 
a bit more performance out of them, but in all cases, one (sometimes 
two) CPU cores are (sometimes) maxed out, but there are always several 
cores permanently idling. So does this apply here?


Thanks for your advice,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL over internet

2013-01-26 Thread Richard Neill



On 27/01/13 02:45, [email protected] wrote:

On Sun, Jan 27, 2013 at 03:15:45AM +0300, belal hamed wrote:


I connect to my server through ADSL connection 4Mbps



Here is your "problem". You need to understand the performance
characteristics of your communication channel. ADSL is a VERY
asymmetric communications channel. Down is usually much faster
than up.


I'm not convinced that ADSL is your problem.

1. Try just SSH directly to the server, and run psql, and run a query 
like this one:

   SELECT 'This is a test message' AS status;

This  should run in under 1ms; it also means that we don't have to worry 
about the details of your database-schema for the purposes of this problem.


2. Try creating a simple SSH tunnel and using your application locally. 
For example, if your server runs Postgresql on port 5432, run this SSH 
command:

  ssh -L 5432:localhost:5432 your_server_hostname
and then connect to your LOCAL (localhost) port 5432; SSH will handle 
the port forwarding.  [Explanation: "localhost" in the SSH command is in 
the context of your_server_hostname]

How does it work now?

3. Try configuration you are currently using, but with the above query.

It should be possible to distinguish between:
  - slowness caused by the database query itself
  - slowness caused by the network fundamentally.
  - slowness caused by the postgresql/libpq.

Hopefully, you'll be able to narrow it down a bit.

HTH,

Richard





--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What setup would you choose for postgresql 9.2 installation?

2013-03-04 Thread Richard Neill



On 04/03/13 13:52, Niels Kristian Schjødt wrote:

LSI MegaRAID SAS 9260-4i with four Intel SSDSC2CW240A3K5 SSDs OR four Hitachi 
Ultrastar 15K600 SAS drives?

My app is pretty write heavy and I have a lot of concurrent connections 300 - 
(though considering adding pgpool2 in front to increase throughput).



If you can afford it, there's no question in my mind that SSDs are the 
way to go. They can be 1000 times faster for random reads.


May I suggest that you do some experiments though - perhaps with just 
one disk of each type - you can get some pretty good illustrative tests 
with ordinary SATA drives in an ordinary laptop/desktop (but not a USB 
adapter). I did this originally when evaluating the (then new) Intel X25 
SSD.


The other things to note are:

* The filesystem matters. For the important thing, fdatasync(), ext2 is 
2x as fast as ext4, which itself is much faster than ext3. BUT ext2's 
fsck is horrid, so we chose ext4.


* Will you enable the disk (or RAID controller) write cache?

* Have you enough RAM for your key tables (and indexes) to fit in 
memory? If not, 64GB of RAM is cheap these days.


* In some applications, you can get a speed boost by turning 
synchronous_commit off - this would mean that in a database crash, the 
last few seconds are potentially lost, even through they application 
thinks they were committed. You may find this an acceptable tradeoff.


* Postgres doesn't always write straight to the tables, but uses the WAL 
(write-ahead-log). So the benefit of SSD performance for "random writes" 
is less relevant than for "random reads".



Lastly, don't overdo the concurrent connections. You may end up with 
less thoughput than  if you let postgres devote more resources to each 
request and let it finish faster.



Hope that helps,

Richard


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance