Re: [PERFORM] hardware upgrade, performance degrade?

2013-03-04 Thread John Rouillard
On Mon, Mar 04, 2013 at 03:54:40PM -0700, Steven Crandell wrote:
> Here's our hardware break down.
> 
> The logvg on the new hardware  is 30MB/s slower (170 MB/s vs 200 MB/s )
> than the logvg on the older hardware which was an immediately interesting
> difference but we have yet to be able to create a test scenario that
> successfully implicates this slower log speed in our problems. That is
> something we are actively working on.
> 
> 
> Old server hardware:
> Manufacturer: Dell Inc.
> Product Name: PowerEdge R810
> 4x Intel(R) Xeon(R) CPU   E7540  @ 2.00GHz
> 32x16384 MB 1066 MHz DDR3
> Controller 0: PERC H700 - 2 disk RAID-1 278.88 GB rootvg
> Controller 1: PERC H800 - 18 disk RAID-6 2,178.00 GB datavg, 4
> drive RAID-10 272.25 GB logvg, 2 hot spare
> 2x 278.88 GB 15K SAS on controller 0
> 24x 136.13 GB 15K SAS on controller 1
> 
> New server hardware:
>Manufacturer: Dell Inc.
> Product Name: PowerEdge R820
> 4x Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz
> 32x32 GB 1333 MHz DDR3
> Controller 0: PERC H710P  - 4 disk RAID-6 557.75 GB rootvg
> Controller 1: PERC H810- 20 disk RAID-60 4,462.00 GB datavg, 2
> disk RAID-1  278.88 GB logvg, 2 hot spare
> 28x278.88 GB 15K SAS drives total.

Hmm, you went from a striped (raid 1/0) log volume on the old hardware
to a non-striped (raid 1) volume on the new hardware. That could
explain the speed drop. Are the disks the same speed for the two
systems?

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists? (solved?)

2012-12-24 Thread John Rouillard
On Mon, Dec 24, 2012 at 06:37:11PM +, Richard Neill wrote:
> [...]
> So... problem solved for me: I just have to reindex every few hours.
> BUT, this suggests a few remaining things:
> [...]
> 2. Is there any way to force the planner to use (or ignore) a
> specific index, for testing purposes, short of actually dropping the
> index?
> This would be very useful for debugging, especially given that query
> plans can only really be fully tested on production systems, and
> that dropping indexes is rather a bad thing to do when live
> operation is simultaneously happening on that server!

I believe that:

  BEGIN;
  drop index 
  explain analyze ...
  explain analyze ...
  ROLLBACK;

will do what you want. IIUC Postgres allows DDL within transactions
and thus be rolled back and the operations within the transaction
aren't visible to your other queries running outside the transaction.

  
http://wiki.postgresql.org/wiki/Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis

and

  http://www.postgresql.org/docs/9.2/static/sql-dropindex.html

-- 
    -- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "error with invalid page header" while vacuuming pgbench data

2011-05-25 Thread John Rouillard
On Wed, May 25, 2011 at 03:19:59PM -0500, Kevin Grittner wrote:
> John Rouillard  wrote:
> > On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote:
> >> John Rouillard  wrote:
> >>  
> >> >  I seem to be able to provoke this error:
> >> > 
> >> >vacuum...ERROR:  invalid page header in
> >> > block 2128910 of relation base/16385/21476
> >>  
> >> What version of PostgreSQL?
> > 
> > Hmm, I thought I replied to this, but I haven't seen it come back
> > to me on list.  It's postgres version: 8.4.5.
> > 
> > rpm -q shows
> > 
> >postgresql84-server-8.4.5-1.el5_5.1
>  
> I was hoping someone else would jump in, but I see that your
> previous post didn't copy the list, which solves *that* mystery.
>  
> I'm curious whether you might have enabled one of the "it's OK to
> trash my database integrity to boost performance" options.  (People
> with enough replication often feel that this *is* OK.)  Please run
> the query on this page and post the results:
>  
> http://wiki.postgresql.org/wiki/Server_Configuration
>  
> Basically, if fsync or full_page_writes is turned off and there was
> a crash, that explains it.  If not, it provides more information to
> proceed.

Nope. Neither is turned off. I can't run the query at the moment since
the system is in the middle of a memtest86+ check of 96GB of
memory. The relevent parts from the config file from the Configuration
Management system are:

  #fsync = on # turns forced synchronization
  # on or off
  #synchronous_commit = on# immediate fsync at commit
  #wal_sync_method = fsync# the default is the first option 

  #full_page_writes = on  # recover from partial page writes

this is the same setup I use on all my data warehouse systems (with
minor pgtune type changes based on amount of memory). Running the
query on another system (using ext3, centos 5.5) shows:

 version| PostgreSQL 8.4.5 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
 archive_command| if test ! -e
/var/lib/pgsql/data/ARCHIVE_ENABLED; then exit 0; fi; test ! -f
/var/bak/pgsql/%f && cp %p /var/bak/p
gsql/%f
 archive_mode   | on
 checkpoint_completion_target   | 0.9
 checkpoint_segments| 64
 constraint_exclusion   | on
 custom_variable_classes| pg_stat_statements
 default_statistics_target  | 100
 effective_cache_size   | 8GB
 lc_collate | en_US.UTF-8
 lc_ctype   | en_US.UTF-8
 listen_addresses   | *
 log_checkpoints| on
 log_connections| on
 log_destination| stderr,syslog
 log_directory  | pg_log
 log_filename   | postgresql-%a.log
 log_line_prefix| %t %u@%d(%p)i: 
 log_lock_waits | on
 log_min_duration_statement | 2s
 log_min_error_statement| warning
 log_min_messages   | notice
 log_rotation_age   | 1d
 log_rotation_size  | 0
 log_temp_files | 0
 log_truncate_on_rotation   | on
 logging_collector  | on
 maintenance_work_mem   | 1GB
 max_connections| 300
 max_locks_per_transaction  | 128
 max_stack_depth| 2MB
 port   | 5432
 server_encoding| UTF8
 shared_buffers | 4GB
 shared_preload_libraries   | pg_stat_statements
 superuser_reserved_connections | 3
 tcp_keepalives_count   | 0
 tcp_keepalives_idle| 0
 tcp_keepalives_interval| 0
 TimeZone   | UTC
 wal_buffers| 32MB
 work_mem   | 16MB

> You might want to re-start the thread on pgsql-general, though.  Not
> everybody who might be able to help with a problem like this follows
> the performance list.  Or, if you didn't set any of the dangerous
> configuration options, this sounds like a bug -- so pgsql-bugs might
> be even better.

Well I am also managing to panic the kernel on some runs as well.  So
my guess is this is not only a postgres bug (if it's a postgres issue
at all).

As gregg mentioned in another followup ext4 under centos 5.x may be an
issue. I'll drop back to ext3 and see if I can replicate the
corruption or crashes one I rule out some potential hardware issues.

If I can replicate with ext3, then I'll follow up on -general or
-bugs.

Ext4 pgbench results complete faster, but if it's not reliable 

Thanks for your help.

--
-- roui

Re: [PERFORM] "error with invalid page header" while vacuuming pgbench data

2011-05-25 Thread John Rouillard
On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote:
> John Rouillard  wrote:
>  
> >  I seem to be able to provoke this error:
> > 
> >vacuum...ERROR:  invalid page header in
> > block 2128910 of relation base/16385/21476
>  
> What version of PostgreSQL?

Hmm, I thought I replied to this, but I haven't seen it come back to
me on list.  It's postgres version: 8.4.5.

rpm -q shows

   postgresql84-server-8.4.5-1.el5_5.1

-- 
    -- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] "error with invalid page header" while vacuuming pgbench data

2011-05-23 Thread John Rouillard
Hi all:

Not sure if this is a performance question or a generic admin
question. I have the following script running on a host different from
the database to use pgbench to test the database:

  pgbench -i (inital mode)
  pgsql vacuum analyze; (and some other code to dump table sizes)
  pgbench (multiple connections, jobs etc )

with a loop for setting different scales 

 I seem to be able to provoke this error:

   vacuum...ERROR:  invalid page header in
block 2128910 of relation base/16385/21476

on a pgbench database created with a scale factor of 1000 relatively
reliably (2 for 2). I am not seeing any disk errors from the raid
controller or the operating system.

Running pg_dumpall to check for errors reports:

   pg_dump: Error message from server: ERROR:  invalid page header in 
block 401585 of relation base/16385/21476

which is different from the originaly reported block.

Does anybody have any suggestions?

Configuration details.

OS: centos 5.5
Filesystem: data - ext4 (note 4 not 3); 6.6T formatted
wal  - ext4; 1.5T formatted
Raid: data - level 10, 8 disk wd2003; controller LSI MegaRAID SAS 9260-4i
  wal  - level 1,  2 disk wd2003; controller LSI MegaRAID SAS 9260-4i

Could it be an ext4 issue? It seems that ext4 may still be at the
bleeding edge for postgres use.

Thanks for any thoughts even if it's go to the admin list.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using pgiosim realistically

2011-05-17 Thread John Rouillard
On Mon, May 16, 2011 at 01:54:06PM -0400, Jeff wrote:
> Yep - you need multiple threads to get max throughput of your io.

I am running:

   ~/pgiosim -c -b 100G -v -t4 file[0-9]*

Will each thread move 100GB of data? I am seeing:

  158.69%,   4260 read,  0 written, 3407.64kB/sec  425.95 iops

Maybe the completion target percentage is off because of the threads?

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote:
> On May 16, 2011, at 9:17 AM, John Rouillard wrote:
> >However, in my case I have an 8 disk raid 10 with a read only load (in
> >this testing configuration). Shouldn't I expect more iops than a
> >single disk can provide? Maybe pgiosim is hitting some other boundary
> >than just i/o?
> >
> 
> given your command line you are only running a single thread - use
> the -t argument to add more threads and that'll increase
> concurrency.  a single process can only process so much at once and
> with multiple threads requesting different things the drive will
> actually be able to respond faster since it will have more work to
> do.
> I tend to test various levels - usually a single (-t 1 - the
> default) to get a base line, then -t (drives / 2), -t (#drives) up
> to probably 4x drives (you'll see iops level off).

Ok cool. I'll try that.
 
> >Also it turns out that pgiosim can only handle 64 files. I haven't
> >checked to see if this is a compile time changable item or not.
> 
> that is a #define in pgiosim.c

So which is a better test, modifying the #define to allow specifying
200-300 1GB files, or using 64 files but increasing the size of my
files to 2-3GB for a total bytes in the file two or three times the
memory in my server (96GB)?

> also, are you running the latest pgiosim from pgfoundry?

yup version 0.5 from the foundry.

> the -w param to pgiosim has it rewrite blocks out as it runs. (it is
> a percentage).

Yup, I was running with that and getting low enough numbers, that I
switched to pure read tests. It looks like I just need multiple
threads so I can have multiple reads/writes in flight at the same
time.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Sat, May 14, 2011 at 12:07:02PM -0500, k...@rice.edu wrote:
> On Fri, May 13, 2011 at 09:09:41PM +0000, John Rouillard wrote:
> > I am adding pgiosim to our testing for new database hardware and I am
> > seeing something I don't quite get and I think it's because I am using
> > pgiosim incorrectly.
> > 
> > Specs:
> > 
> >   OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
> >   memory: 96GB
> >   cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
> >   disks: WD2003FYYS RE4
> >   raid: lsi - 9260-4i with 8 disks in raid 10 configuration
> >   1MB stripe size
> >   raid cache enabled w/ bbu
> >   disk caches disabled
> >   filesystem: ext3 created with -E stride=256
> > 
> > I am seeing really poor (70) iops with pgiosim.  According to:
> > http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
> > database benchmark they are seeing ~170 iops on a single disk for
> > these drives. I would expect an 8 disk raid 10 should get better then
> > 3x the single disk rate (assuming the data is randomly distributed).
> Those drives are 7200 rpm drives which would give you a maximum write
> rate of 120/sec at best with the cache disabled. I actually think your
> 70/sec is closer to reality and what you should anticipate in real use.
> I do not see how they could make 170/sec. Did they strap a jet engine to
> the drive. :)

Hmm, I stated the disk cache was disabled. I should have said the disk
write cache, but it's possible the readhead cache is disabled as well
(not quite sure how to tell on the lsi cards). Also there isn't a lot
of detail in what the database test mix is and I haven't tried
researching the site to see if the spec the exact test. If it included
a lot of writes and they were being handled by a cache then that could
explain it.

However, in my case I have an 8 disk raid 10 with a read only load (in
this testing configuration). Shouldn't I expect more iops than a
single disk can provide? Maybe pgiosim is hitting some other boundary
than just i/o?

Also it turns out that pgiosim can only handle 64 files. I haven't
checked to see if this is a compile time changable item or not.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Using pgiosim realistically

2011-05-13 Thread John Rouillard
Hi all:

I am adding pgiosim to our testing for new database hardware and I am
seeing something I don't quite get and I think it's because I am using
pgiosim incorrectly.

Specs:

  OS: centos 5.5 kernel: 2.6.18-194.32.1.el5
  memory: 96GB
  cpu: 2x Intel(R) Xeon(R) X5690  @ 3.47GHz (6 core, ht enabled)
  disks: WD2003FYYS RE4
  raid: lsi - 9260-4i with 8 disks in raid 10 configuration
  1MB stripe size
  raid cache enabled w/ bbu
  disk caches disabled
  filesystem: ext3 created with -E stride=256

I am seeing really poor (70) iops with pgiosim.  According to:
http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the
database benchmark they are seeing ~170 iops on a single disk for
these drives. I would expect an 8 disk raid 10 should get better then
3x the single disk rate (assuming the data is randomly distributed).

To test I am using 5 100GB files with

sudo ~/pgiosim -c -b 100G -v file?

I am using 100G sizes to make sure that the data read and files sizes
exceed the memory size of the system.

However if I use 5 1GB files (and still 100GB read data) I see 200+ to
400+ iops at 50% of the 100GB of data read, which I assume means that
the data is cached in the OS cache and I am not really getting hard
drive/raid I/O measurement of iops.

However, IIUC postgres will never have an index file greater than 1GB
in size
(http://www.postgresql.org/docs/8.4/static/storage-file-layout.html)
and will just add 1GB segments, so the 1GB size files seems to be more
realistic.

So do I want 100 (or probably 2 or 3 times more say 300) 1GB files to
feed pgiosim? That way I will have enough data that not all of it can
be cached in memory and the file sizes (and file operations:
open/close) more closely match what postgres is doing with index
files?

Also in the output of pgiosim I see:

  25.17%,   2881 read,  0 written, 2304.56kB/sec  288.07 iops

which I interpret (left to right) as the % of the 100GB that has been
read, the number of read operations over some time period, number of
bytes read/written and the io operations/sec. Iops always seems to be
1/10th of the read number (rounded up to an integer). Is this
expected and if so anybody know why?

While this is running if I also run "iostat -p /dev/sdc 5" I see:

  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc 166.40  2652.80 4.80  13264 24
  sdc1   2818.80 1.20   999.20  6   4996

which I am interpreting as 2818 read/io operations (corresponding more
or less to read in the pgiosim output) to the partition and of those
only 116 are actually going to the drive??? with the rest handled from
OS cache.

However the tps isn't increasing when I see pgiosim reporting:

   48.47%,   4610 read,  0 written, 3687.62kB/sec  460.95 iops

an iostat 5 output near the same time is reporting:

  Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
  sdc 165.87  2647.50 4.79  13264 24
  sdc1   2812.97 0.60   995.41  3   4987

so I am not sure if there is a correlation between the read and tps
settings.

Also I am assuming blks written is filesystem metadata although that
seems like a lot of data 

If I stop the pgiosim, the iostat drops to 0 write and reads as
expected.

So does anybody have any comments on how to test with pgiosim and how
to correlate the iostat and pgiosim outputs?

Thanks for your feedback.
-- 
    -- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] OT (slightly) testing for data loss on an SSD drive due to power failure

2011-04-22 Thread John Rouillard

Hi all:

I realize this is slightly off topic, but is an issue of concern with
the use of ssd's. We are setting up a storage server under solaris
using ZFS. We have a couple of ssd's 2 x 160GB Intel X25-M MLC SATA
acting as the zil (write journal) and are trying to see if it is safe
to use for a power fail situation.

Our testing (10 runs) hasn't shown any data loss, but I am not sure
our testing has been running long enough and is valid, so I hoped the
people here who have tested an ssd for data loss may have some
guidance.

The testing method is to copy a bunch of files over NFS to the server
with the zil. When the copy is running along, pull the power to the
server. The NFS client will stop and if the client got a message that
block X was written safely to the zil, it will continue writing with
block x+1. After the server comes backup and and the copies
resume/finish the files are checksummed. If block X went missing, the
checksums will fail and we will have our proof.

We are looking at how to max out the writes to the SSD on the theory
that we need to fill the dram buffer on the SSD and get it saturated
enough such that it can't flush data to permanent storage as fast as
the data is coming in. (I.E. make it a writeback with a longer delay
so it's more likely to drop data.)

Does anybody have any comments or testing methodologies that don't
involve using an actual postgres instance?

Thanks for your help.
-- 
    -- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Assessing performance of fetches

2011-04-18 Thread John Rouillard
Hi all:

An application running against a postgres 8.4.5 database under CentOS
5.5 uses cursors (I think via SqlAlchemy). To look for database
performance issues I log any query that takes > 2 seconds to complete.

I am seeing:

  2011-04-16 00:55:33 UTC user@database(3516): LOG:  duration:
 371954.811 ms  statement: FETCH FORWARD 1 FROM c_2aaeea50_a08

While I obviously have a problem here, is there any way to log the
actual select associated with the cursor other than logging all
statements?

Also once I have the select statement, does the fact that is is
associated with a fetch/cursor change the steps I should take in
tuning it compared to somebody just issuing a normal select?

Thanks for any ideas.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are we in the ballpark?

2011-02-02 Thread John Rouillard
On Wed, Feb 02, 2011 at 10:06:53AM -0700, Wayne Conrad wrote:
> On 02/01/11 18:30, Greg Smith wrote:
> >>Bonnie++ (-f -n 0 -c 4)
> >>$PGDATA/xlog (RAID1)
> >>random seek: 369/sec
> >>block out: 87 MB/sec
> >>block in: 180 MB/sec
> >>$PGDATA (RAID10, 12 drives)
> >>random seek: 452
> >>block out: 439 MB/sec
> >>block in: 881 MB/sec
> >>
> >>sysbench test of fsync (commit) rate:
> >>
> >>$PGDATA/xlog (RAID1)
> >>cache off: 29 req/sec
> >>cache on: 9,342 req/sec
> >>$PGDATA (RAID10, 12 drives)
> >>cache off: 61 req/sec
> >>cache on: 8,191 req/sec
> >
> >That random seek rate is a little low for 12 drives, but that's probably
> >the limitations of the 3ware controller kicking in there. Your "cache
> >off" figures are really weird though; I'd expect those both to be around
> >100. Makes me wonder if something weird is happening in the controller,
> >or if there was a problem with your config when testing that. Not a big
> >deal, really--the cached numbers are normally going to be the important
> >ones--but it is odd.
> 
> I also thought the "cache off" figures were odd.  I expected
> something much closer to 120 req/sec (7200 rpm drives).  I probably
> won't investigate that with any vigor, since the cache-on numbers
> are OK.

You may want to look into the "cache off" figures a little more. We
run a number of battery backed raid controllers and we test the
batteries every 6 months or so. When we test the batteries, the cache
goes off line (as it should) to help keep the data valid.

If you need to test your raid card batteries (nothing like having a
battery with only a 6 hour runtime when it takes you a couple of days
MTTR), can your database app survive with that low a commit rate? As
you said you ar expecting something almost 4-5x faster with 7200 rpm
disks.

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux boxeradmin(11983)i: STATEMENT: update license set expires= '2010-06-15' where lic

2010-07-08 Thread John Rouillard
On Thu, Jul 08, 2010 at 09:31:32AM -0700, Ryan Wexler wrote:
> Thanks a lot for all the comments.  The fact that both my windows box and
> the old linux box both show a massive performance improvement over the new
> linux box seems to point to hardware to me.  I am not sure how to test the
> fsync issue, but i don't see how that could be it.
> 
> The raid card the server has in it is:
> 3Ware 4 Port 9650SE-4LPML RAID Card
> 
> Looking it up, it seems to indicate that it has BBU

By "looking it up", I assume you mean running tw_cli and looking at
the output to make sure the bbu is enabled and the cache is turned on
for the raid array u0 or u1 ...?

-- 
    -- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is Diskeeper Automatic Mode safe?

2009-11-16 Thread John Rouillard
On Mon, Nov 16, 2009 at 03:20:12PM -0500, Greg Smith wrote:
> Robert Schnabel wrote:
> >Nope.  Forgive my ignorance but isn't that what a UPS is for anyway?  
> >Along with a BBU controller.
>
> If you have a UPS *and* a BBU controller, then things are much 
> better--those should have a write cache that insulates you from the 
> worst of the problems.  But just a UPS alone doesn't help you very much:
> 
> 1) A UPS is built with a consumable (the battery), and they do wear 
> out.  Unless you're proactive about monitoring UPS battery quality and 
> doing tests, you won't find this out until the first time the power goes 
> out and the UPS doesn't work anymore.

Well the bbu is just another battery (ok some are capacitors but...)
so the same caveats apply for a bbu raid card. We test ours every 6
months and fail them if they are less than a 5 day capacity (failure
over a long weekend 3 days + 1-2 day(s) to fix the issue (replace
power supply, mobo etc.)).

-- 
-- rouilj

John Rouillard   System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-29 Thread John Rouillard
On Mon, Apr 28, 2008 at 02:16:02PM -0400, Greg Smith wrote:
> On Mon, 28 Apr 2008, John Rouillard wrote:
> 
> >   2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds 
> >   apart)
> > so I changed:
> >  checkpoint_segments = 30
> >  checkpoint_warning = 150
> 
> That's good, but you might go higher than 30 for a bulk loading operation 
> like this, particularly on 8.1 where checkpoints are no fun.  Using 100 is 
> not unreasonable.

Ok. I can do that. I chose 30 to make the WAL logs span the 5 minute

  checkpoint_timeout = 300

so that the 30 segments wouldn't wrap over before the 5 minute
checkpoint that usually occurs. Maybe I should increase both the
timeout and the segments?
 
> >shared_buffers = 3000
> >I don't see any indication in the docs that increasing shared memory
> >would help speed up a copy operation.
> 
> The index blocks use buffer space, and what ends up happening if there's 
> not enough memory is they are written out more than they need to be (and 
> with your I/O hardware you need to avoid writes unless absolutely 
> necessary).

I forgot to mention the raid 1/0 is on a 3ware 9550SX-4LP raid card
setup as raid 1/0. The write cache is on and autoverify is turned off.

> Theoretically the OS is caching around that situation but 
> better to avoid it. 

The system is using 6-8MB of memory for cache.

> You didn't say how much RAM you have,

16GB total, but 8GB or so is taken up with other processes.

> but you should 
> start by a factor of 10 increase to 30,000 and see if that helps; if so, 
> try making it large enough to use 1/4 of total server memory.  3000 is 
> only giving the server 24MB of RAM to work with, and it's unfair to expect 
> it to work well in that situation.

So swap the memory usage from the OS cache to the postgresql process.
Using 1/4 as a guideline it sounds like 600,000 (approx 4GB) is a
better setting. So I'll try 30 to start (1/8 of memory) and see
what it does to the other processes on the box.
 
> While not relevant to this exercise you'll need to set 
> effective_cache_size to a useful value one day as well.

This is a very lightly loaded database, a few queries/hour usually
scattered across the data set, so hopefully that won't be much of an
issue.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-29 Thread John Rouillard
On Tue, Apr 29, 2008 at 05:19:59AM +0930, Shane Ambler wrote:
> John Rouillard wrote:
> 
> >We can't do this as we are backfilling a couple of months of data 
> >into tables with existing data.
> 
> Is this a one off data loading of historic data or an ongoing thing?

Yes it's a one off bulk data load of many days of data. The daily
loads will also take 3 hour's but that is ok since we only do those
once a day so we have 21 hours of slack in the schedule 8-).

> >>>The only indexes we have to drop are the ones on the primary keys
> >>> (there is one non-primary key index in the database as well).
> 
> If this amount of data importing is ongoing then one thought I would try
> is partitioning (this could be worthwhile anyway with the amount of data
> you appear to have).
> Create an inherited table for the month being imported, load the data 
> into it, then add the check constraints, indexes, and modify the 
> rules/triggers to handle the inserts to the parent table.

Hmm, interesting idea, worth considering if we have to do this again
(I hope not). 

Thaks for the reply.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread John Rouillard
On Mon, Apr 28, 2008 at 06:53:09PM +0100, Heikki Linnakangas wrote:
> John Rouillard wrote:
> >We are running postgresql-8.1.3 under Centos 4
> You should upgrade, at least to the latest minor release of the 8.1 
> series (8.1.11), as there has been a bunch of important bug and security 
> fixes. Or even better, upgrade to 8.3, which has reduced the storage 
> size of especially variable length datatypes like text/char/varchar in 
> particular. As your COPY is I/O bound, reducing storage size will 
> translate directly to improved performance.

Yup. Just saw that suggestion in an unrelated email.
 
> >dm-6 is where the data files reside and dm-4 is where the WAL archives
> >are kept. Note all the DM's are on the same RAID 0 device /dev/sda2.
> 
> Another reason to upgrade to 8.3: if you CREATE or TRUNCATE the table in 
> the same transaction as you COPY into it, you can avoid WAL logging of 
> the loaded data, which will in the best case double your performance as 
> your WAL is on the same physical drives as the data files.

We can't do this as we are backfilling a couple of months of data into
tables with existing data.
 
> >The only indexes we have to drop are the ones on the primary keys
> >(there is one non-primary key index in the database as well).
> >
> >Can you drop an index on the primary key for a table and add it back
> >later?  Am I correct in saying: the primary key index is what enforces
> >the unique constraint in the table? If the index is dropped and
> >non-unique primary key data has been added, what happens when you
> >re-add the index?
> 
> Yes, the index is what enforces the uniqueness. You can drop the primary 
> key constraint, and add it back after the load with ALTER TABLE. If the 
> load introduces any non-unique primary keys, adding the primary key 
> constraint will give you an error and fail.

That's the part I am worried about. I guess using psql to delete the
problem row then re-adding the index will work.
 
> Dropping and recreating the indexes is certainly worth trying.

Thanks for the info.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-28 Thread John Rouillard
t in the table? If the index is dropped and
non-unique primary key data has been added, what happens when you
re-add the index?

Does anybody have any things to check/ideas on why loading a 100Mb sql
file using psql would take 3 hours?

Thanks in advance for any ideas.

--
        -- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance