Re: [PERFORM] hardware upgrade, performance degrade?
On Mon, Mar 04, 2013 at 03:54:40PM -0700, Steven Crandell wrote: > Here's our hardware break down. > > The logvg on the new hardware is 30MB/s slower (170 MB/s vs 200 MB/s ) > than the logvg on the older hardware which was an immediately interesting > difference but we have yet to be able to create a test scenario that > successfully implicates this slower log speed in our problems. That is > something we are actively working on. > > > Old server hardware: > Manufacturer: Dell Inc. > Product Name: PowerEdge R810 > 4x Intel(R) Xeon(R) CPU E7540 @ 2.00GHz > 32x16384 MB 1066 MHz DDR3 > Controller 0: PERC H700 - 2 disk RAID-1 278.88 GB rootvg > Controller 1: PERC H800 - 18 disk RAID-6 2,178.00 GB datavg, 4 > drive RAID-10 272.25 GB logvg, 2 hot spare > 2x 278.88 GB 15K SAS on controller 0 > 24x 136.13 GB 15K SAS on controller 1 > > New server hardware: >Manufacturer: Dell Inc. > Product Name: PowerEdge R820 > 4x Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz > 32x32 GB 1333 MHz DDR3 > Controller 0: PERC H710P - 4 disk RAID-6 557.75 GB rootvg > Controller 1: PERC H810- 20 disk RAID-60 4,462.00 GB datavg, 2 > disk RAID-1 278.88 GB logvg, 2 hot spare > 28x278.88 GB 15K SAS drives total. Hmm, you went from a striped (raid 1/0) log volume on the old hardware to a non-striped (raid 1) volume on the new hardware. That could explain the speed drop. Are the disks the same speed for the two systems? -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists? (solved?)
On Mon, Dec 24, 2012 at 06:37:11PM +, Richard Neill wrote: > [...] > So... problem solved for me: I just have to reindex every few hours. > BUT, this suggests a few remaining things: > [...] > 2. Is there any way to force the planner to use (or ignore) a > specific index, for testing purposes, short of actually dropping the > index? > This would be very useful for debugging, especially given that query > plans can only really be fully tested on production systems, and > that dropping indexes is rather a bad thing to do when live > operation is simultaneously happening on that server! I believe that: BEGIN; drop index explain analyze ... explain analyze ... ROLLBACK; will do what you want. IIUC Postgres allows DDL within transactions and thus be rolled back and the operations within the transaction aren't visible to your other queries running outside the transaction. http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis and http://www.postgresql.org/docs/9.2/static/sql-dropindex.html -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] "error with invalid page header" while vacuuming pgbench data
On Wed, May 25, 2011 at 03:19:59PM -0500, Kevin Grittner wrote: > John Rouillard wrote: > > On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote: > >> John Rouillard wrote: > >> > >> > I seem to be able to provoke this error: > >> > > >> >vacuum...ERROR: invalid page header in > >> > block 2128910 of relation base/16385/21476 > >> > >> What version of PostgreSQL? > > > > Hmm, I thought I replied to this, but I haven't seen it come back > > to me on list. It's postgres version: 8.4.5. > > > > rpm -q shows > > > >postgresql84-server-8.4.5-1.el5_5.1 > > I was hoping someone else would jump in, but I see that your > previous post didn't copy the list, which solves *that* mystery. > > I'm curious whether you might have enabled one of the "it's OK to > trash my database integrity to boost performance" options. (People > with enough replication often feel that this *is* OK.) Please run > the query on this page and post the results: > > http://wiki.postgresql.org/wiki/Server_Configuration > > Basically, if fsync or full_page_writes is turned off and there was > a crash, that explains it. If not, it provides more information to > proceed. Nope. Neither is turned off. I can't run the query at the moment since the system is in the middle of a memtest86+ check of 96GB of memory. The relevent parts from the config file from the Configuration Management system are: #fsync = on # turns forced synchronization # on or off #synchronous_commit = on# immediate fsync at commit #wal_sync_method = fsync# the default is the first option #full_page_writes = on # recover from partial page writes this is the same setup I use on all my data warehouse systems (with minor pgtune type changes based on amount of memory). Running the query on another system (using ext3, centos 5.5) shows: version| PostgreSQL 8.4.5 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit archive_command| if test ! -e /var/lib/pgsql/data/ARCHIVE_ENABLED; then exit 0; fi; test ! -f /var/bak/pgsql/%f && cp %p /var/bak/p gsql/%f archive_mode | on checkpoint_completion_target | 0.9 checkpoint_segments| 64 constraint_exclusion | on custom_variable_classes| pg_stat_statements default_statistics_target | 100 effective_cache_size | 8GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_checkpoints| on log_connections| on log_destination| stderr,syslog log_directory | pg_log log_filename | postgresql-%a.log log_line_prefix| %t %u@%d(%p)i: log_lock_waits | on log_min_duration_statement | 2s log_min_error_statement| warning log_min_messages | notice log_rotation_age | 1d log_rotation_size | 0 log_temp_files | 0 log_truncate_on_rotation | on logging_collector | on maintenance_work_mem | 1GB max_connections| 300 max_locks_per_transaction | 128 max_stack_depth| 2MB port | 5432 server_encoding| UTF8 shared_buffers | 4GB shared_preload_libraries | pg_stat_statements superuser_reserved_connections | 3 tcp_keepalives_count | 0 tcp_keepalives_idle| 0 tcp_keepalives_interval| 0 TimeZone | UTC wal_buffers| 32MB work_mem | 16MB > You might want to re-start the thread on pgsql-general, though. Not > everybody who might be able to help with a problem like this follows > the performance list. Or, if you didn't set any of the dangerous > configuration options, this sounds like a bug -- so pgsql-bugs might > be even better. Well I am also managing to panic the kernel on some runs as well. So my guess is this is not only a postgres bug (if it's a postgres issue at all). As gregg mentioned in another followup ext4 under centos 5.x may be an issue. I'll drop back to ext3 and see if I can replicate the corruption or crashes one I rule out some potential hardware issues. If I can replicate with ext3, then I'll follow up on -general or -bugs. Ext4 pgbench results complete faster, but if it's not reliable Thanks for your help. -- -- roui
Re: [PERFORM] "error with invalid page header" while vacuuming pgbench data
On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote: > John Rouillard wrote: > > > I seem to be able to provoke this error: > > > >vacuum...ERROR: invalid page header in > > block 2128910 of relation base/16385/21476 > > What version of PostgreSQL? Hmm, I thought I replied to this, but I haven't seen it come back to me on list. It's postgres version: 8.4.5. rpm -q shows postgresql84-server-8.4.5-1.el5_5.1 -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] "error with invalid page header" while vacuuming pgbench data
Hi all: Not sure if this is a performance question or a generic admin question. I have the following script running on a host different from the database to use pgbench to test the database: pgbench -i (inital mode) pgsql vacuum analyze; (and some other code to dump table sizes) pgbench (multiple connections, jobs etc ) with a loop for setting different scales I seem to be able to provoke this error: vacuum...ERROR: invalid page header in block 2128910 of relation base/16385/21476 on a pgbench database created with a scale factor of 1000 relatively reliably (2 for 2). I am not seeing any disk errors from the raid controller or the operating system. Running pg_dumpall to check for errors reports: pg_dump: Error message from server: ERROR: invalid page header in block 401585 of relation base/16385/21476 which is different from the originaly reported block. Does anybody have any suggestions? Configuration details. OS: centos 5.5 Filesystem: data - ext4 (note 4 not 3); 6.6T formatted wal - ext4; 1.5T formatted Raid: data - level 10, 8 disk wd2003; controller LSI MegaRAID SAS 9260-4i wal - level 1, 2 disk wd2003; controller LSI MegaRAID SAS 9260-4i Could it be an ext4 issue? It seems that ext4 may still be at the bleeding edge for postgres use. Thanks for any thoughts even if it's go to the admin list. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using pgiosim realistically
On Mon, May 16, 2011 at 01:54:06PM -0400, Jeff wrote: > Yep - you need multiple threads to get max throughput of your io. I am running: ~/pgiosim -c -b 100G -v -t4 file[0-9]* Will each thread move 100GB of data? I am seeing: 158.69%, 4260 read, 0 written, 3407.64kB/sec 425.95 iops Maybe the completion target percentage is off because of the threads? -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using pgiosim realistically
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote: > On May 16, 2011, at 9:17 AM, John Rouillard wrote: > >However, in my case I have an 8 disk raid 10 with a read only load (in > >this testing configuration). Shouldn't I expect more iops than a > >single disk can provide? Maybe pgiosim is hitting some other boundary > >than just i/o? > > > > given your command line you are only running a single thread - use > the -t argument to add more threads and that'll increase > concurrency. a single process can only process so much at once and > with multiple threads requesting different things the drive will > actually be able to respond faster since it will have more work to > do. > I tend to test various levels - usually a single (-t 1 - the > default) to get a base line, then -t (drives / 2), -t (#drives) up > to probably 4x drives (you'll see iops level off). Ok cool. I'll try that. > >Also it turns out that pgiosim can only handle 64 files. I haven't > >checked to see if this is a compile time changable item or not. > > that is a #define in pgiosim.c So which is a better test, modifying the #define to allow specifying 200-300 1GB files, or using 64 files but increasing the size of my files to 2-3GB for a total bytes in the file two or three times the memory in my server (96GB)? > also, are you running the latest pgiosim from pgfoundry? yup version 0.5 from the foundry. > the -w param to pgiosim has it rewrite blocks out as it runs. (it is > a percentage). Yup, I was running with that and getting low enough numbers, that I switched to pure read tests. It looks like I just need multiple threads so I can have multiple reads/writes in flight at the same time. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using pgiosim realistically
On Sat, May 14, 2011 at 12:07:02PM -0500, k...@rice.edu wrote: > On Fri, May 13, 2011 at 09:09:41PM +0000, John Rouillard wrote: > > I am adding pgiosim to our testing for new database hardware and I am > > seeing something I don't quite get and I think it's because I am using > > pgiosim incorrectly. > > > > Specs: > > > > OS: centos 5.5 kernel: 2.6.18-194.32.1.el5 > > memory: 96GB > > cpu: 2x Intel(R) Xeon(R) X5690 @ 3.47GHz (6 core, ht enabled) > > disks: WD2003FYYS RE4 > > raid: lsi - 9260-4i with 8 disks in raid 10 configuration > > 1MB stripe size > > raid cache enabled w/ bbu > > disk caches disabled > > filesystem: ext3 created with -E stride=256 > > > > I am seeing really poor (70) iops with pgiosim. According to: > > http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the > > database benchmark they are seeing ~170 iops on a single disk for > > these drives. I would expect an 8 disk raid 10 should get better then > > 3x the single disk rate (assuming the data is randomly distributed). > Those drives are 7200 rpm drives which would give you a maximum write > rate of 120/sec at best with the cache disabled. I actually think your > 70/sec is closer to reality and what you should anticipate in real use. > I do not see how they could make 170/sec. Did they strap a jet engine to > the drive. :) Hmm, I stated the disk cache was disabled. I should have said the disk write cache, but it's possible the readhead cache is disabled as well (not quite sure how to tell on the lsi cards). Also there isn't a lot of detail in what the database test mix is and I haven't tried researching the site to see if the spec the exact test. If it included a lot of writes and they were being handled by a cache then that could explain it. However, in my case I have an 8 disk raid 10 with a read only load (in this testing configuration). Shouldn't I expect more iops than a single disk can provide? Maybe pgiosim is hitting some other boundary than just i/o? Also it turns out that pgiosim can only handle 64 files. I haven't checked to see if this is a compile time changable item or not. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Using pgiosim realistically
Hi all: I am adding pgiosim to our testing for new database hardware and I am seeing something I don't quite get and I think it's because I am using pgiosim incorrectly. Specs: OS: centos 5.5 kernel: 2.6.18-194.32.1.el5 memory: 96GB cpu: 2x Intel(R) Xeon(R) X5690 @ 3.47GHz (6 core, ht enabled) disks: WD2003FYYS RE4 raid: lsi - 9260-4i with 8 disks in raid 10 configuration 1MB stripe size raid cache enabled w/ bbu disk caches disabled filesystem: ext3 created with -E stride=256 I am seeing really poor (70) iops with pgiosim. According to: http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the database benchmark they are seeing ~170 iops on a single disk for these drives. I would expect an 8 disk raid 10 should get better then 3x the single disk rate (assuming the data is randomly distributed). To test I am using 5 100GB files with sudo ~/pgiosim -c -b 100G -v file? I am using 100G sizes to make sure that the data read and files sizes exceed the memory size of the system. However if I use 5 1GB files (and still 100GB read data) I see 200+ to 400+ iops at 50% of the 100GB of data read, which I assume means that the data is cached in the OS cache and I am not really getting hard drive/raid I/O measurement of iops. However, IIUC postgres will never have an index file greater than 1GB in size (http://www.postgresql.org/docs/8.4/static/storage-file-layout.html) and will just add 1GB segments, so the 1GB size files seems to be more realistic. So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to feed pgiosim? That way I will have enough data that not all of it can be cached in memory and the file sizes (and file operations: open/close) more closely match what postgres is doing with index files? Also in the output of pgiosim I see: 25.17%, 2881 read, 0 written, 2304.56kB/sec 288.07 iops which I interpret (left to right) as the % of the 100GB that has been read, the number of read operations over some time period, number of bytes read/written and the io operations/sec. Iops always seems to be 1/10th of the read number (rounded up to an integer). Is this expected and if so anybody know why? While this is running if I also run "iostat -p /dev/sdc 5" I see: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdc 166.40 2652.80 4.80 13264 24 sdc1 2818.80 1.20 999.20 6 4996 which I am interpreting as 2818 read/io operations (corresponding more or less to read in the pgiosim output) to the partition and of those only 116 are actually going to the drive??? with the rest handled from OS cache. However the tps isn't increasing when I see pgiosim reporting: 48.47%, 4610 read, 0 written, 3687.62kB/sec 460.95 iops an iostat 5 output near the same time is reporting: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sdc 165.87 2647.50 4.79 13264 24 sdc1 2812.97 0.60 995.41 3 4987 so I am not sure if there is a correlation between the read and tps settings. Also I am assuming blks written is filesystem metadata although that seems like a lot of data If I stop the pgiosim, the iostat drops to 0 write and reads as expected. So does anybody have any comments on how to test with pgiosim and how to correlate the iostat and pgiosim outputs? Thanks for your feedback. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] OT (slightly) testing for data loss on an SSD drive due to power failure
Hi all: I realize this is slightly off topic, but is an issue of concern with the use of ssd's. We are setting up a storage server under solaris using ZFS. We have a couple of ssd's 2 x 160GB Intel X25-M MLC SATA acting as the zil (write journal) and are trying to see if it is safe to use for a power fail situation. Our testing (10 runs) hasn't shown any data loss, but I am not sure our testing has been running long enough and is valid, so I hoped the people here who have tested an ssd for data loss may have some guidance. The testing method is to copy a bunch of files over NFS to the server with the zil. When the copy is running along, pull the power to the server. The NFS client will stop and if the client got a message that block X was written safely to the zil, it will continue writing with block x+1. After the server comes backup and and the copies resume/finish the files are checksummed. If block X went missing, the checksums will fail and we will have our proof. We are looking at how to max out the writes to the SSD on the theory that we need to fill the dram buffer on the SSD and get it saturated enough such that it can't flush data to permanent storage as fast as the data is coming in. (I.E. make it a writeback with a longer delay so it's more likely to drop data.) Does anybody have any comments or testing methodologies that don't involve using an actual postgres instance? Thanks for your help. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Assessing performance of fetches
Hi all: An application running against a postgres 8.4.5 database under CentOS 5.5 uses cursors (I think via SqlAlchemy). To look for database performance issues I log any query that takes > 2 seconds to complete. I am seeing: 2011-04-16 00:55:33 UTC user@database(3516): LOG: duration: 371954.811 ms statement: FETCH FORWARD 1 FROM c_2aaeea50_a08 While I obviously have a problem here, is there any way to log the actual select associated with the cursor other than logging all statements? Also once I have the select statement, does the fact that is is associated with a fetch/cursor change the steps I should take in tuning it compared to somebody just issuing a normal select? Thanks for any ideas. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are we in the ballpark?
On Wed, Feb 02, 2011 at 10:06:53AM -0700, Wayne Conrad wrote: > On 02/01/11 18:30, Greg Smith wrote: > >>Bonnie++ (-f -n 0 -c 4) > >>$PGDATA/xlog (RAID1) > >>random seek: 369/sec > >>block out: 87 MB/sec > >>block in: 180 MB/sec > >>$PGDATA (RAID10, 12 drives) > >>random seek: 452 > >>block out: 439 MB/sec > >>block in: 881 MB/sec > >> > >>sysbench test of fsync (commit) rate: > >> > >>$PGDATA/xlog (RAID1) > >>cache off: 29 req/sec > >>cache on: 9,342 req/sec > >>$PGDATA (RAID10, 12 drives) > >>cache off: 61 req/sec > >>cache on: 8,191 req/sec > > > >That random seek rate is a little low for 12 drives, but that's probably > >the limitations of the 3ware controller kicking in there. Your "cache > >off" figures are really weird though; I'd expect those both to be around > >100. Makes me wonder if something weird is happening in the controller, > >or if there was a problem with your config when testing that. Not a big > >deal, really--the cached numbers are normally going to be the important > >ones--but it is odd. > > I also thought the "cache off" figures were odd. I expected > something much closer to 120 req/sec (7200 rpm drives). I probably > won't investigate that with any vigor, since the cache-on numbers > are OK. You may want to look into the "cache off" figures a little more. We run a number of battery backed raid controllers and we test the batteries every 6 months or so. When we test the batteries, the cache goes off line (as it should) to help keep the data valid. If you need to test your raid card batteries (nothing like having a battery with only a 6 hour runtime when it takes you a couple of days MTTR), can your database app survive with that low a commit rate? As you said you ar expecting something almost 4-5x faster with 7200 rpm disks. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance on new linux boxeradmin(11983)i: STATEMENT: update license set expires= '2010-06-15' where lic
On Thu, Jul 08, 2010 at 09:31:32AM -0700, Ryan Wexler wrote: > Thanks a lot for all the comments. The fact that both my windows box and > the old linux box both show a massive performance improvement over the new > linux box seems to point to hardware to me. I am not sure how to test the > fsync issue, but i don't see how that could be it. > > The raid card the server has in it is: > 3Ware 4 Port 9650SE-4LPML RAID Card > > Looking it up, it seems to indicate that it has BBU By "looking it up", I assume you mean running tw_cli and looking at the output to make sure the bbu is enabled and the cache is turned on for the raid array u0 or u1 ...? -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, Nov 16, 2009 at 03:20:12PM -0500, Greg Smith wrote: > Robert Schnabel wrote: > >Nope. Forgive my ignorance but isn't that what a UPS is for anyway? > >Along with a BBU controller. > > If you have a UPS *and* a BBU controller, then things are much > better--those should have a write cache that insulates you from the > worst of the problems. But just a UPS alone doesn't help you very much: > > 1) A UPS is built with a consumable (the battery), and they do wear > out. Unless you're proactive about monitoring UPS battery quality and > doing tests, you won't find this out until the first time the power goes > out and the UPS doesn't work anymore. Well the bbu is just another battery (ok some are capacitors but...) so the same caveats apply for a bbu raid card. We test ours every 6 months and fail them if they are less than a 5 day capacity (failure over a long weekend 3 days + 1-2 day(s) to fix the issue (replace power supply, mobo etc.)). -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
On Mon, Apr 28, 2008 at 02:16:02PM -0400, Greg Smith wrote: > On Mon, 28 Apr 2008, John Rouillard wrote: > > > 2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds > > apart) > > so I changed: > > checkpoint_segments = 30 > > checkpoint_warning = 150 > > That's good, but you might go higher than 30 for a bulk loading operation > like this, particularly on 8.1 where checkpoints are no fun. Using 100 is > not unreasonable. Ok. I can do that. I chose 30 to make the WAL logs span the 5 minute checkpoint_timeout = 300 so that the 30 segments wouldn't wrap over before the 5 minute checkpoint that usually occurs. Maybe I should increase both the timeout and the segments? > >shared_buffers = 3000 > >I don't see any indication in the docs that increasing shared memory > >would help speed up a copy operation. > > The index blocks use buffer space, and what ends up happening if there's > not enough memory is they are written out more than they need to be (and > with your I/O hardware you need to avoid writes unless absolutely > necessary). I forgot to mention the raid 1/0 is on a 3ware 9550SX-4LP raid card setup as raid 1/0. The write cache is on and autoverify is turned off. > Theoretically the OS is caching around that situation but > better to avoid it. The system is using 6-8MB of memory for cache. > You didn't say how much RAM you have, 16GB total, but 8GB or so is taken up with other processes. > but you should > start by a factor of 10 increase to 30,000 and see if that helps; if so, > try making it large enough to use 1/4 of total server memory. 3000 is > only giving the server 24MB of RAM to work with, and it's unfair to expect > it to work well in that situation. So swap the memory usage from the OS cache to the postgresql process. Using 1/4 as a guideline it sounds like 600,000 (approx 4GB) is a better setting. So I'll try 30 to start (1/8 of memory) and see what it does to the other processes on the box. > While not relevant to this exercise you'll need to set > effective_cache_size to a useful value one day as well. This is a very lightly loaded database, a few queries/hour usually scattered across the data set, so hopefully that won't be much of an issue. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
On Tue, Apr 29, 2008 at 05:19:59AM +0930, Shane Ambler wrote: > John Rouillard wrote: > > >We can't do this as we are backfilling a couple of months of data > >into tables with existing data. > > Is this a one off data loading of historic data or an ongoing thing? Yes it's a one off bulk data load of many days of data. The daily loads will also take 3 hour's but that is ok since we only do those once a day so we have 21 hours of slack in the schedule 8-). > >>>The only indexes we have to drop are the ones on the primary keys > >>> (there is one non-primary key index in the database as well). > > If this amount of data importing is ongoing then one thought I would try > is partitioning (this could be worthwhile anyway with the amount of data > you appear to have). > Create an inherited table for the month being imported, load the data > into it, then add the check constraints, indexes, and modify the > rules/triggers to handle the inserts to the parent table. Hmm, interesting idea, worth considering if we have to do this again (I hope not). Thaks for the reply. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very poor performance loading 100M of sql data using copy
On Mon, Apr 28, 2008 at 06:53:09PM +0100, Heikki Linnakangas wrote: > John Rouillard wrote: > >We are running postgresql-8.1.3 under Centos 4 > You should upgrade, at least to the latest minor release of the 8.1 > series (8.1.11), as there has been a bunch of important bug and security > fixes. Or even better, upgrade to 8.3, which has reduced the storage > size of especially variable length datatypes like text/char/varchar in > particular. As your COPY is I/O bound, reducing storage size will > translate directly to improved performance. Yup. Just saw that suggestion in an unrelated email. > >dm-6 is where the data files reside and dm-4 is where the WAL archives > >are kept. Note all the DM's are on the same RAID 0 device /dev/sda2. > > Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in > the same transaction as you COPY into it, you can avoid WAL logging of > the loaded data, which will in the best case double your performance as > your WAL is on the same physical drives as the data files. We can't do this as we are backfilling a couple of months of data into tables with existing data. > >The only indexes we have to drop are the ones on the primary keys > >(there is one non-primary key index in the database as well). > > > >Can you drop an index on the primary key for a table and add it back > >later? Am I correct in saying: the primary key index is what enforces > >the unique constraint in the table? If the index is dropped and > >non-unique primary key data has been added, what happens when you > >re-add the index? > > Yes, the index is what enforces the uniqueness. You can drop the primary > key constraint, and add it back after the load with ALTER TABLE. If the > load introduces any non-unique primary keys, adding the primary key > constraint will give you an error and fail. That's the part I am worried about. I guess using psql to delete the problem row then re-adding the index will work. > Dropping and recreating the indexes is certainly worth trying. Thanks for the info. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Very poor performance loading 100M of sql data using copy
t in the table? If the index is dropped and non-unique primary key data has been added, what happens when you re-add the index? Does anybody have any things to check/ideas on why loading a 100Mb sql file using psql would take 3 hours? Thanks in advance for any ideas. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance