subject:"\\\[PERFORM\\\] Weird XFS WAL problem"

Greg Smith wrote:
Kevin Grittner wrote:
I've seen this, too (with xfs). Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync). This
does strike me as surprising to the point of bordering on qualifying
as a bug.
Completely intentional, and documented at
http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F

The issue is that XFS will actually send the full flush your cache
call to the controller, rather than just the usual fsync call, and that
eliminates the benefit of having a write cache there in the first
place. Good controllers respect that and flush their whole write cache
out. And ext4 has adopted the same mechanism. This is very much a good
thing from the perspective of database reliability for people with
regular hard drives who don't have a useful write cache on their cheap
hard drives. It allows them to keep the disk's write cache on for other
things, while still getting the proper cache flushes when the database
commits demand them. It does mean that everyone with a non-volatile
battery backed cache, via RAID card typically, needs to turn barriers
off manually.

I've already warned on this list that PostgreSQL commit performance on
ext4 is going to appear really terrible to many people. If you
benchmark and don't recognize ext3 wasn't operating in a reliable mode
before, the performance drop now that ext4 is doing the right thing with
barriers looks impossibly bad.

Well, this is depressing. Now that we finally have common
battery-backed cache RAID controller cards, the file system developers
have throw down another roadblock in ext4 and xfs. Do we need to
document this?

On another topic, I am a little unclear on how things behave when the
drive is write-back. If the RAID controller card writes to the drive,
but the data isn't on the platers, how does it know when it can discard
that information from the BBU RAID cache?

--
Bruce Momjian br...@momjian.ushttp://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-04 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 
 On another topic, I am a little unclear on how things behave when
 the drive is write-back. If the RAID controller card writes to the
 drive, but the data isn't on the platers, how does it know when it
 can discard that information from the BBU RAID cache?
 
The controller waits for the drive to tell it that it has made it to
the platter before it discards it.  What made you think otherwise?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  
  On another topic, I am a little unclear on how things behave when
  the drive is write-back. If the RAID controller card writes to the
  drive, but the data isn't on the platers, how does it know when it
  can discard that information from the BBU RAID cache?
  
 The controller waits for the drive to tell it that it has made it to
 the platter before it discards it.  What made you think otherwise?

Because a write-back drive cache says it is on the drive before it hits
the platters, which I think is the default for SATA drive.  Is that
inaccurate?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-04 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 Kevin Grittner wrote:
 
 The controller waits for the drive to tell it that it has made it
 to the platter before it discards it.  What made you think
 otherwise?
 
 Because a write-back drive cache says it is on the drive before it
 hits the platters, which I think is the default for SATA drive.
 Is that inaccurate?
 
Any decent RAID controller will ensure that the drives themselves
aren't using write-back caching.  When we've mentioned write-back
versus write-through on this thread we've been talking about the
behavior of the *controller*.  We have our controllers configured to
use write-back through the BBU cache as long as the battery is good,
but to automatically switch to write-through if the battery goes
bad.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  Kevin Grittner wrote:
  
  The controller waits for the drive to tell it that it has made it
  to the platter before it discards it.  What made you think
  otherwise?
  
  Because a write-back drive cache says it is on the drive before it
  hits the platters, which I think is the default for SATA drive.
  Is that inaccurate?
  
 Any decent RAID controller will ensure that the drives themselves
 aren't using write-back caching.  When we've mentioned write-back
 versus write-through on this thread we've been talking about the
 behavior of the *controller*.  We have our controllers configured to
 use write-back through the BBU cache as long as the battery is good,
 but to automatically switch to write-through if the battery goes
 bad.

OK, good, but when why would a BBU RAID controller flush stuff to disk
with a flush-all command?  I thought the whole goal of BBU was to avoid
such flushes.  What is unique about the command ext4/xfs is sending?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-04 Thread Kevin Grittner

Bruce Momjian br...@momjian.us wrote:
 Kevin Grittner wrote:
 
 Any decent RAID controller will ensure that the drives themselves
 aren't using write-back caching.  When we've mentioned write-back
 versus write-through on this thread we've been talking about the
 behavior of the *controller*.  We have our controllers configured
 to use write-back through the BBU cache as long as the battery is
 good, but to automatically switch to write-through if the battery
 goes bad.
 
 OK, good, but when why would a BBU RAID controller flush stuff to
 disk with a flush-all command?  I thought the whole goal of BBU
 was to avoid such flushes.
 
That has been *precisely* my point.
 
I don't know at the protocol level; I just know that write barriers
do *something* which causes our controllers to wait for actual disk
platter persistence, while fsync does not.
 
The write barrier concept seems good to me, and I wish it could be
used at the OS level without killing performance.  I blame the
controller, for not treating it the same as fsync (i.e., as long as
it's in write-back mode it should treat data as persisted as soon as
it's in BBU cache).
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  Kevin Grittner wrote:
  
  Any decent RAID controller will ensure that the drives themselves
  aren't using write-back caching.  When we've mentioned write-back
  versus write-through on this thread we've been talking about the
  behavior of the *controller*.  We have our controllers configured
  to use write-back through the BBU cache as long as the battery is
  good, but to automatically switch to write-through if the battery
  goes bad.
  
  OK, good, but when why would a BBU RAID controller flush stuff to
  disk with a flush-all command?  I thought the whole goal of BBU
  was to avoid such flushes.
  
 That has been *precisely* my point.
  
 I don't know at the protocol level; I just know that write barriers
 do *something* which causes our controllers to wait for actual disk
 platter persistence, while fsync does not.
  
 The write barrier concept seems good to me, and I wish it could be
 used at the OS level without killing performance.  I blame the
 controller, for not treating it the same as fsync (i.e., as long as
 it's in write-back mode it should treat data as persisted as soon as
 it's in BBU cache).

Yeah.  I wonder if it honors the cache flush because it might think it
is replacing disks or something odd.  I think we are going to have to
document this in 9.0 because obviously you have seen it already.

Is this an issue with SAS cards/drives as well?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Merlin Moncure

On Wed, Jun 2, 2010 at 7:30 PM, Craig James craig_ja...@emolecules.com wrote:
 I'm testing/tuning a new midsize server and ran into an inexplicable
 problem.  With an RAID10 drive, when I move the WAL to a separate RAID1
 drive, TPS drops from over 1200 to less than 90!   I've checked everything
 and can't find a reason.

 Here are the details.

 8 cores (2x4 Intel Nehalem 2 GHz)
 12 GB memory
 12 x 7200 SATA 500 GB disks
 3WARE 9650SE-12ML RAID controller with bbu
  2 disks: RAID1  500GB ext4  blocksize=4096
  8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 (ext4 or xfs - see
 below)
  2 disks: hot swap
 Ubuntu 10.04 LTS (Lucid)

 With xfs or ext4 on the RAID10 I got decent bonnie++ and pgbench results
 (this one is for xfs):

 Version 1.03e       --Sequential Output-- --Sequential Input-
 --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
 %CP
 argon        24064M 70491  99 288158  25 129918  16 65296  97 428210  23
 558.9   1
                    --Sequential Create-- Random
 Create
                    -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
 %CP
                 16 23283  81 + +++ 13775  56 20143  74 + +++ 15152
  54
 argon,24064M,70491,99,288158,25,129918,16,65296,97,428210,23,558.9,1,16,23283,81,+,+++,13775,56,20143\
 ,74,+,+++,15152,54

 pgbench -i -s 100 -U test
 pgbench -c 10 -t 1 -U test
    scaling factor: 100
    query mode: simple
    number of clients: 10
    number of transactions per client: 1
    number of transactions actually processed: 10/10
    tps = 1046.104635 (including connections establishing)
    tps = 1046.337276 (excluding connections establishing)

 Now the mystery: I moved the pg_xlog directory to a RAID1 array (same 3WARE
 controller, two more SATA 7200 disks).  Run the same tests and ...

    tps = 82.325446 (including connections establishing)
    tps = 82.326874 (excluding connections establishing)

 I thought I'd made a mistake, like maybe I moved the whole database to the
 RAID1 array, but I checked and double checked.  I even watched the lights
 blink - the WAL was definitely on the RAID1 and the rest of Postgres on the
 RAID10.

 So I moved the WAL back to the RAID10 array, and performance jumped right
 back up to the 1200 TPS range.

 Next I check the RAID1 itself:

  dd if=/dev/zero of=./bigfile bs=8192 count=200

 which yielded 98.8 MB/sec - not bad.  bonnie++ on the RAID1 pair showed good
 performance too:

 Version 1.03e       --Sequential Output-- --Sequential Input-
 --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
 --Seeks--
 Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
 %CP
 argon        24064M 68601  99 110057  18 46534   6 59883  90 123053   7
 471.3   1
                    --Sequential Create-- Random
 Create
                    -Create-- --Read--- -Delete-- -Create-- --Read---
 -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
 %CP
                 16 + +++ + +++ + +++ + +++ + +++ +
 +++
 argon,24064M,68601,99,110057,18,46534,6,59883,90,123053,7,471.3,1,16,+,+++,+,+++,+,+++,+,\
 +++,+,+++,+,+++

 So ... anyone have any idea at all how TPS drops to below 90 when I move the
 WAL to a separate RAID1 disk?  Does this make any sense at all?  It's
 repeatable. It happens for both ext4 and xfs. It's weird.

 You can even watch the disk lights and see it: the RAID10 disks are on
 almost constantly when the WAL is on the RAID10, but when you move the WAL
 over to the RAID1, its lights are dim and flicker a lot, like it's barely
 getting any data, and the RAID10 disk's lights barely go on at all.

*) Is your raid 1 configured writeback cache on the controller?
*) have you tried changing wal_sync_method to fdatasync?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem


Craig James wrote:
I'm testing/tuning a new midsize server and ran into an inexplicable 
problem.  With an RAID10 drive, when I move the WAL to a separate 
RAID1 drive, TPS drops from over 1200 to less than 90!


Normally 100 TPS means that the write cache on the WAL drive volume is 
disabled (or set to write-through instead of write-back).  When things 
in this area get fishy, I will usually download sysbench and have it 
specifically test how many fsync calls can happen per second.  
http://projects.2ndquadrant.com/talks , Database Hardware 
Benchmarking, page 28 has an example of the right incantation for that.


Also, make sure you run 3ware's utilities and confirm all the disks have 
finished their initialization and verification stages.  If you just 
adjusted disk layout that and immediate launched into benchmarks, those 
are useless until the background cleanup is done.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Craig James


On 6/2/10 4:40 PM, Mark Kirkwood wrote:

On 03/06/10 11:30, Craig James wrote:

I'm testing/tuning a new midsize server and ran into an inexplicable
problem. With an RAID10 drive, when I move the WAL to a separate RAID1
drive, TPS drops from over 1200 to less than 90! I've checked
everything and can't find a reason.


Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones?


Yes.


Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  barrier=1 isn't just a bad idea on ext4, it's a 
disaster.

pgbench -i -s 100 -U test
pgbench -c 10 -t 1 -U test

Change WAL to barrier=0

tps = 1463.264981 (including connections establishing)
tps = 1463.725687 (excluding connections establishing)

Change WAL to noatime, nodiratime, barrier=0

tps = 1479.331476 (including connections establishing)
tps = 1479.810545 (excluding connections establishing)

Change WAL to barrier=1

tps = 82.325446 (including connections establishing)
tps = 82.326874 (excluding connections establishing)

This is really hard to believe, because the bonnie++ numbers and dd(1) numbers look good 
(see my original post).  But it's totally repeatable.  It must be some really unfortunate 
just missed the next sector going by the write head problem.

So with ext4, bonnie++ and dd aren't the whole story.

BTW, I also learned that if you edit /etc/fstab and use mount -oremount it WON'T change barrier=0/1 
unless it is explicit in the fstab file.  That is, if you put barrier=0 into /etc/fstab and use the remount, it will 
change it to no barriers.  But if you then remove it from /etc/fstab, it won't change it back to the default.  You have to 
actually put barrier=1 if you want to get it back to the default.  This seems like a bug to me, and it made it really 
hard to track this down. mount -oremount is not the same as umount/mount!

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Matthew Wakeling


On Thu, 3 Jun 2010, Craig James wrote:

Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  barrier=1 isn't just a bad idea on ext4, it's a 
disaster.


This worries me a little. Does your array have a battery-backed cache? If 
so, then it should be fast regardless of barriers (although barriers may 
make a small difference). If it does not, then it is likely that the fast 
speed you are seeing with barriers off is unsafe.


There should be no just missed the sector going past for write problem 
ever with a battery-backed cache.


Matthew

--
There once was a limerick .sig
that really was not very big
It was going quite fine
Till it reached the fourth line

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Matthew Wakeling matt...@flymine.org wrote:
 On Thu, 3 Jun 2010, Craig James wrote:
 Also, are barriers *on* on the RAID1 mount and off on the RAID10
one?

 It was the barriers.  barrier=1 isn't just a bad idea on ext4,
 it's a disaster.
 
 This worries me a little. Does your array have a battery-backed
 cache? If so, then it should be fast regardless of barriers
 (although barriers may make a small difference). If it does not,
 then it is likely that the fast speed you are seeing with barriers
 off is unsafe.
 
I've seen this, too (with xfs).  Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync).  This
does strike me as surprising to the point of bordering on qualifying
as a bug.  It means that you can't take advantage of the BBU cache
and get the benefit of write barriers in OS cache behavior.  :-(
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Kevin Grittner wrote:

I've seen this, too (with xfs). Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync). This
does strike me as surprising to the point of bordering on qualifying
as a bug.
Completely intentional, and documented at
http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com www.2ndQuadrant.us

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem


Craig James wrote:
This is really hard to believe, because the bonnie++ numbers and dd(1) 
numbers look good (see my original post).  But it's totally 
repeatable.  It must be some really unfortunate just missed the next 
sector going by the write head problem.


Commit performance is a separate number to measure that is not reflected 
in any benchmark that tests sequential performance.  I consider it the 
fourth axis of disk system performance (seq read, seq write, random 
IOPS, commit rate), and directly measure it with the sysbench fsync test 
I recommended already.  (You can do it with the right custom pgbench 
script too).


You only get one commit per rotation on a drive, which is exactly what 
you're seeing:  a bit under the 120 spins/second @ 7200 RPM.  Attempts 
to time things just right to catch more than one sector per spin are 
extremely difficult to accomplish, I spent a week on that once without 
making any good progress.  You can easily get 100MB/s on reads and 
writes but only manage 100 commits/second.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Greg Smith g...@2ndquadrant.com wrote:
Kevin Grittner wrote:
I've seen this, too (with xfs). Our RAID controller, in spite of
having BBU cache configured for writeback, waits for actual
persistence on disk for write barriers (unlike for fsync). This
does strike me as surprising to the point of bordering on
qualifying as a bug.
Completely intentional, and documented at

http://xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F

Yeah, I read that long ago and I've disabled write barriers because
of it; however, it still seems wrong that the RAID controller
insists on flushing to the drives in write-back mode. Here are my
reasons for wishing it was otherwise:

(1) We've had batteries on our RAID controllers fail occasionally.
The controller automatically degrades to write-through, and we get
an email from the server and schedule a tech to travel to the site
and replace the battery; but until we take action we are now exposed
to possible database corruption. Barriers don't automatically come
on when the controller flips to write-through mode.

(2) It precludes any possibility of moving from fsync techniques to
write barrier techniques for ensuring database integrity. If the OS
respected write barriers and the controller considered the write
satisfied when it hit BBU cache, write barrier techniques would
work, and checkpoints could be made smoother. Think how nicely that
would inter-operate with point (1).

So, while I understand it's Working As Designed, I think the design
is surprising and sub-optimal.

-Kevin

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Scott Marlowe

On Thu, Jun 3, 2010 at 12:40 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Yeah, I read that long ago and I've disabled write barriers because
 of it; however, it still seems wrong that the RAID controller
 insists on flushing to the drives in write-back mode.  Here are my
 reasons for wishing it was otherwise:

I think it's a case of the quickest, simplest answer to semi-new tech.
 Not sure what to do with barriers?  Just flush the whole cache.

I'm guessing that this will get optimized in the future.

BTW, I'll have LSI Megaraid latest and greatest to test on in a month,
and older Areca 1680s as well. I'll be updating the firmware on the
arecas, and I'll run some tests on the whole barrier behaviour to see
if it's gotten any better lately.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

Scott Marlowe scott.marl...@gmail.com wrote:
 
 I think it's a case of the quickest, simplest answer to semi-new
 tech.  Not sure what to do with barriers?  Just flush the whole
 cache.
 
 I'm guessing that this will get optimized in the future.
 
Let's hope so.
 
That reminds me, the write barrier concept is at least on the
horizon as a viable technology; does anyone know if the asynchronous
graphs concept in this (one page) paper ever came to anything?  (I
haven't hear anything about it lately.)
 
http://www.usenix.org/events/fast05/wips/burnett.pdf
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem


Scott Marlowe wrote:

I think it's a case of the quickest, simplest answer to semi-new tech.
 Not sure what to do with barriers?  Just flush the whole cache.
  


Well, that really is the only useful thing you can do with regular SATA 
drives; the ATA command set isn't any finer grained than that in a way 
that's useful for this context.  And it's also quite reasonable for a 
RAID controller to respond to that flush the whole cache call by 
flushing its cache.  So it's not just the simplest first answer, I 
believe it's the only answer until a better ATA command set becomes 
available.


I think this can only be resolved usefully for all of us at the RAID 
firmware level.  If the controller had some logic that said it's OK to 
not flush the cache when that call comes in if my battery is working 
fine, that would make this whole problem go away.  I don't expect it's 
possible to work around the exact set of concerns Kevin listed any other 
way, because as he pointed out the right thing to do is very dependent 
on the battery health, which the OS also doesn't know (again, would 
require some new command set verbage).


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Scott Marlowe

On Thu, Jun 3, 2010 at 1:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 Scott Marlowe wrote:

 I think it's a case of the quickest, simplest answer to semi-new tech.
  Not sure what to do with barriers?  Just flush the whole cache.


 Well, that really is the only useful thing you can do with regular SATA
 drives; the ATA command set isn't any finer grained than that in a way
 that's useful for this context.  And it's also quite reasonable for a RAID
 controller to respond to that flush the whole cache call by flushing its
 cache.  So it's not just the simplest first answer, I believe it's the only
 answer until a better ATA command set becomes available.

 I think this can only be resolved usefully for all of us at the RAID
 firmware level.  If the controller had some logic that said it's OK to not
 flush the cache when that call comes in if my battery is working fine,

That's what already happens for fsync on a BBU controller, so I don't
think the code to do so would be something fancy and new, just a
simple change of logic on which code path to take.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Weird XFS WAL problem