Re: scsi vs ide performance on fsync's

2001-03-12 Thread Andre Hedrick

On Wed, 7 Mar 2001, Stephen C. Tweedie wrote:

> Hi,
> 
> On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote:
> 
> > On Tue, 6 Mar 2001, Alan Cox wrote:
> > > You want a write barrier. Write buffering (at least for short intervals) in
> > > the drive is very sensible. The kernel needs to able to send drivers a write
> > > barrier which will not be completed with outstanding commands before the
> > > barrier.
> > 
> > But Alan is right - we needs a "sync" command or something. I don't know
> > if IDE has one (it already might, for all I know).
> 
> Sync and barrier are very different models.  With barriers we can
> enforce some elemnt of write ordering without actually waiting for the
> IOs to complete; with sync, we're explicitly asking to be told when
> the data has become persistant.  We can make use of both of these.
> 
> SCSI certainly lets us do both of these operations independently.  IDE
> has the sync/flush command afaik, but I'm not sure whether the IDE
> tagged command stuff has the equivalent of SCSI's ordered tag bits.
> Andre?

ATA-TCQ suxs to put is plain and simple.  It really requires a special
host and only the HPT366 series works.  It is similar but not clear as to
the nature.  We are debating the usage of it now in T13.

Cheers,

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-12 Thread Andre Hedrick

On Wed, 7 Mar 2001, Stephen C. Tweedie wrote:

 Hi,
 
 On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote:
 
  On Tue, 6 Mar 2001, Alan Cox wrote:
   You want a write barrier. Write buffering (at least for short intervals) in
   the drive is very sensible. The kernel needs to able to send drivers a write
   barrier which will not be completed with outstanding commands before the
   barrier.
  
  But Alan is right - we needs a "sync" command or something. I don't know
  if IDE has one (it already might, for all I know).
 
 Sync and barrier are very different models.  With barriers we can
 enforce some elemnt of write ordering without actually waiting for the
 IOs to complete; with sync, we're explicitly asking to be told when
 the data has become persistant.  We can make use of both of these.
 
 SCSI certainly lets us do both of these operations independently.  IDE
 has the sync/flush command afaik, but I'm not sure whether the IDE
 tagged command stuff has the equivalent of SCSI's ordered tag bits.
 Andre?

ATA-TCQ suxs to put is plain and simple.  It really requires a special
host and only the HPT366 series works.  It is similar but not clear as to
the nature.  We are debating the usage of it now in T13.

Cheers,

Andre Hedrick
Linux ATA Development

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Matthias Urlichs

Hi,

Jens Axboe:
> > But most disks these days support IDE-SCSI, and SCSI does have ordered
> > tags, so...
> 
> Any proof to back this up? To my knowledge, only some WDC ATA disks
> can be ATAPI driven.
> 
Ummm, no, but that was my impression. If that's wrong, I apologize and
will state the opposite, next time.

-- 
Matthias Urlichs | noris network AG | http://smurf.noris.de/
-- 
You see things; and you say 'Why?'
But I dream things that never were; and I say 'Why not?'
   --George Bernard Shaw [Back to Methuselah]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Jens Axboe

On Fri, Mar 09 2001, Matthias Urlichs wrote:
> Matthias Urlichs:
> > On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > > SCSI certainly lets us do both of these operations independently.  IDE
> > > has the sync/flush command afaik, but I'm not sure whether the IDE
> > > tagged command stuff has the equivalent of SCSI's ordered tag bits.
> > > Andre?
> > 
> > IDE has no concept of ordered tags...
> > 
> But most disks these days support IDE-SCSI, and SCSI does have ordered
> tags, so...

Any proof to back this up? To my knowledge, only some WDC ATA disks
can be ATAPI driven.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Jonathan Morton

>> It's pretty clear that the IDE drive(r) is *not* waiting for the physical
>> write to take place before returning control to the user program, whereas
>> the SCSI drive(r) is.
>
>This would not be unexpected.
>
>IDE drives generally always do write buffering. I don't even know if you
>_can_ turn it off. So the drive claims to have written the data as soon as
>it has made the write buffer.
>
>It's definitely not the driver, but the actual drive.

As I suspected.  However, testing shows that many drives, including most
IBMs, do respond to hdparm -W0 which turns write-caching off (some drives
don't, including some Seagates).  There are also drives in existence that
have no cache at all (mostly old sub-1G drives) and some with too little
for this to make a significant difference (the old 1.2G TravelStar in one
of my PowerBooks is an example).

So, is there a way to force (the majority of, rather than all) IDE drives
to wait until it's been truly committed to media?  If so, will this be
integrated into the appropriate parts of the kernel, particularly for
certain members of the sync() family and FS unmounting?

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Jonathan Morton

 It's pretty clear that the IDE drive(r) is *not* waiting for the physical
 write to take place before returning control to the user program, whereas
 the SCSI drive(r) is.

This would not be unexpected.

IDE drives generally always do write buffering. I don't even know if you
_can_ turn it off. So the drive claims to have written the data as soon as
it has made the write buffer.

It's definitely not the driver, but the actual drive.

As I suspected.  However, testing shows that many drives, including most
IBMs, do respond to hdparm -W0 which turns write-caching off (some drives
don't, including some Seagates).  There are also drives in existence that
have no cache at all (mostly old sub-1G drives) and some with too little
for this to make a significant difference (the old 1.2G TravelStar in one
of my PowerBooks is an example).

So, is there a way to force (the majority of, rather than all) IDE drives
to wait until it's been truly committed to media?  If so, will this be
integrated into the appropriate parts of the kernel, particularly for
certain members of the sync() family and FS unmounting?

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Jens Axboe

On Fri, Mar 09 2001, Matthias Urlichs wrote:
 Matthias Urlichs:
  On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
   SCSI certainly lets us do both of these operations independently.  IDE
   has the sync/flush command afaik, but I'm not sure whether the IDE
   tagged command stuff has the equivalent of SCSI's ordered tag bits.
   Andre?
  
  IDE has no concept of ordered tags...
  
 But most disks these days support IDE-SCSI, and SCSI does have ordered
 tags, so...

Any proof to back this up? To my knowledge, only some WDC ATA disks
can be ATAPI driven.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-09 Thread Matthias Urlichs

Hi,

Jens Axboe:
  But most disks these days support IDE-SCSI, and SCSI does have ordered
  tags, so...
 
 Any proof to back this up? To my knowledge, only some WDC ATA disks
 can be ATAPI driven.
 
Ummm, no, but that was my impression. If that's wrong, I apologize and
will state the opposite, next time.

-- 
Matthias Urlichs | noris network AG | http://smurf.noris.de/
-- 
You see things; and you say 'Why?'
But I dream things that never were; and I say 'Why not?'
   --George Bernard Shaw [Back to Methuselah]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Matthias Urlichs

Hi,

Matthias Urlichs:
> On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > SCSI certainly lets us do both of these operations independently.  IDE
> > has the sync/flush command afaik, but I'm not sure whether the IDE
> > tagged command stuff has the equivalent of SCSI's ordered tag bits.
> > Andre?
> 
> IDE has no concept of ordered tags...
> 
But most disks these days support IDE-SCSI, and SCSI does have ordered
tags, so...

Has anybody done speed comparisons between "native" IDE and IDE-SCSI?

-- 
Matthias Urlichs | noris network AG | http://smurf.noris.de/
-- 
Success is something I will dress for when I get there, and not until.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Chris Mason



On Wednesday, March 07, 2001 08:56:59 PM + "Stephen C. Tweedie"
<[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
>> On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
>> > 
>> > For most fs'es, that's not an issue.  The fs won't start writeback on
>> > the primary disk at all until the journal commit has been acknowledged
>> > as firm on disk.
>> 
>> But do you then force wait on that journal commit?
> 
> It doesn't matter too much --- it's only the writeback which is doing
> this (ext3 uses a separate journal thread for it), so any sleep is
> only there to wait for the moment when writeback can safely begin:
> users of the filesystem won't see any stalls.

It is similar under reiserfs unless the log is full and new transactions
have to wait for flushes to free up the log space.  It is probably valid to
assume the dedicated log device will be large enough that this won't happen
very often, or fast enough (nvram) that it won't matter when it does happen.

> 
>> A barrier operation is sufficient then. So you're saying don't
>> over design, a simple barrier is all you need?
> 
> Pretty much so.  The simple barrier is the only thing which can be
> effectively optimised at the hardware level with SCSI anyway.
> 

The simple barrier is a good starting point regardless.  If we can find
hardware where it makes sense to do cross queue barriers (big raid
controllers?), it might be worth trying.

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 10:36:38AM -0800, Linus Torvalds wrote:
> On Wed, 7 Mar 2001, Jeremy Hansen wrote:
> > 
> > So in the meantime as this gets worked out on a lower level, we've decided
> > to take the fsync() out of berkeley db for mysql transaction logs and
> > mount the filesystem -o sync.
> > 
> > Can anyone perhaps tell me why this may be a bad idea?
> 
>  - it doesn't help. The disk will _still_ do write buffering. It's the
>DISK, not the OS. It doesn't matter what you do.
>  - your performance will suck.

Added to which, "-o sync" only enables sync metadata updates.  It
still doesn't force an fsync on data writes.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Pavel Machek


Hi!
> If not, then the drive could by all means optimise the access pattern
> provided it acked the data or provided the results in the same order as the
> instructions were given.  This would probably shorten the time for a new
> pathological set (distributed evenly across the disk surface, but all on
> the worst-possible angular offset compared to the previous) to (8ms seek
> time + 5ms rotational delay) * 4000 writes ~= 52 seconds (compared with
> around 120 seconds for the previous set with rotational delay factored in).
> Great, so you only need half as big a power store to guarantee writing that
> much data, but it's still too much.  Even with a 15000rpm drive and 5ms
> seek times, it would still be too much.

Drive can trivially seek to reserved track, and flush data on it. All within 
25msec. Then, move data to proper location on next powerup. Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Pavel Machek


Hi!
 If not, then the drive could by all means optimise the access pattern
 provided it acked the data or provided the results in the same order as the
 instructions were given.  This would probably shorten the time for a new
 pathological set (distributed evenly across the disk surface, but all on
 the worst-possible angular offset compared to the previous) to (8ms seek
 time + 5ms rotational delay) * 4000 writes ~= 52 seconds (compared with
 around 120 seconds for the previous set with rotational delay factored in).
 Great, so you only need half as big a power store to guarantee writing that
 much data, but it's still too much.  Even with a 15000rpm drive and 5ms
 seek times, it would still be too much.

Drive can trivially seek to reserved track, and flush data on it. All within 
25msec. Then, move data to proper location on next powerup. Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 10:36:38AM -0800, Linus Torvalds wrote:
 On Wed, 7 Mar 2001, Jeremy Hansen wrote:
  
  So in the meantime as this gets worked out on a lower level, we've decided
  to take the fsync() out of berkeley db for mysql transaction logs and
  mount the filesystem -o sync.
  
  Can anyone perhaps tell me why this may be a bad idea?
 
  - it doesn't help. The disk will _still_ do write buffering. It's the
DISK, not the OS. It doesn't matter what you do.
  - your performance will suck.

Added to which, "-o sync" only enables sync metadata updates.  It
still doesn't force an fsync on data writes.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Chris Mason



On Wednesday, March 07, 2001 08:56:59 PM + "Stephen C. Tweedie"
[EMAIL PROTECTED] wrote:

 Hi,
 
 On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
 On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
  
  For most fs'es, that's not an issue.  The fs won't start writeback on
  the primary disk at all until the journal commit has been acknowledged
  as firm on disk.
 
 But do you then force wait on that journal commit?
 
 It doesn't matter too much --- it's only the writeback which is doing
 this (ext3 uses a separate journal thread for it), so any sleep is
 only there to wait for the moment when writeback can safely begin:
 users of the filesystem won't see any stalls.

It is similar under reiserfs unless the log is full and new transactions
have to wait for flushes to free up the log space.  It is probably valid to
assume the dedicated log device will be large enough that this won't happen
very often, or fast enough (nvram) that it won't matter when it does happen.

 
 A barrier operation is sufficient then. So you're saying don't
 over design, a simple barrier is all you need?
 
 Pretty much so.  The simple barrier is the only thing which can be
 effectively optimised at the hardware level with SCSI anyway.
 

The simple barrier is a good starting point regardless.  If we can find
hardware where it makes sense to do cross queue barriers (big raid
controllers?), it might be worth trying.

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-08 Thread Matthias Urlichs

Hi,

Matthias Urlichs:
 On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
  SCSI certainly lets us do both of these operations independently.  IDE
  has the sync/flush command afaik, but I'm not sure whether the IDE
  tagged command stuff has the equivalent of SCSI's ordered tag bits.
  Andre?
 
 IDE has no concept of ordered tags...
 
But most disks these days support IDE-SCSI, and SCSI does have ordered
tags, so...

Has anybody done speed comparisons between "native" IDE and IDE-SCSI?

-- 
Matthias Urlichs | noris network AG | http://smurf.noris.de/
-- 
Success is something I will dress for when I get there, and not until.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
> > On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > > 
> > > For most fs'es, that's not an issue.  The fs won't start writeback on
> > > the primary disk at all until the journal commit has been acknowledged
> > > as firm on disk.
> > 
> > But do you then force wait on that journal commit?
> 
> It doesn't matter too much --- it's only the writeback which is doing
> this (ext3 uses a separate journal thread for it), so any sleep is
> only there to wait for the moment when writeback can safely begin:
> users of the filesystem won't see any stalls.

Ok, but even if this is true for ext3 it may not be true for other
journalled fs. AFAIR, reiser is doing an explicit wait_on_buffer
which would then amount to quite a performance hit (speculation,
haven't measured).

> > A barrier operation is sufficient then. So you're saying don't
> > over design, a simple barrier is all you need?
> 
> Pretty much so.  The simple barrier is the only thing which can be
> effectively optimised at the hardware level with SCSI anyway.

True

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
> On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > 
> > For most fs'es, that's not an issue.  The fs won't start writeback on
> > the primary disk at all until the journal commit has been acknowledged
> > as firm on disk.
> 
> But do you then force wait on that journal commit?

It doesn't matter too much --- it's only the writeback which is doing
this (ext3 uses a separate journal thread for it), so any sleep is
only there to wait for the moment when writeback can safely begin:
users of the filesystem won't see any stalls.

> A barrier operation is sufficient then. So you're saying don't
> over design, a simple barrier is all you need?

Pretty much so.  The simple barrier is the only thing which can be
effectively optimised at the hardware level with SCSI anyway.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote:
> > On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > 
> > My bigger concern is when the journalled fs has a log on a different
> > queue.
> 
> For most fs'es, that's not an issue.  The fs won't start writeback on
> the primary disk at all until the journal commit has been acknowledged
> as firm on disk.

But do you then force wait on that journal commit?

> Certainly for ext3, synchronisation between the log and the primary
> disk is no big thing.  What really hurts is writing to the log, where
> we have to wait for the log writes to complete before submitting the
> commit write (which is sequentially allocated just after the rest of
> the log blocks).  Specifying a barrier on the commit block would allow
> us to keep the log device streaming, and the fs can deal with
> synchronising the primary disk quite happily by itself.

A barrier operation is sufficient then. So you're saying don't
over design, a simple barrier is all you need?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote:
> On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> 
> My bigger concern is when the journalled fs has a log on a different
> queue.

For most fs'es, that's not an issue.  The fs won't start writeback on
the primary disk at all until the journal commit has been acknowledged
as firm on disk.

Certainly for ext3, synchronisation between the log and the primary
disk is no big thing.  What really hurts is writing to the log, where
we have to wait for the log writes to complete before submitting the
commit write (which is sequentially allocated just after the rest of
the log blocks).  Specifying a barrier on the commit block would allow
us to keep the log device streaming, and the fs can deal with
synchronising the primary disk quite happily by itself.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > Yep, it's much harder than it seems. Especially because for the barrier
> > to be really useful, having inter-request dependencies becomes a
> > requirement. So you can say something like 'flush X and Y, but don't
> > flush Y before X is done'.
> 
> Yes.  Fortunately, the simplest possible barrier is just a matter of
> marking a request as non-reorderable, and then making sure that you
> both flush the elevator queue before servicing that request, and defer
> any subsequent requests until the barrier request has been satisfied.
> One it has gone through, you can let through the deferred requests (in
> order, up to the point at which you encounter another barrier).

The above should have been inter-queue dependencies. For one queue
it's not a big issue, you basically described the whole sequence
above. Either sequence it as zero for a non-empty queue and make
sure the low level driver orders or flushes, or just hand it directly
to the device.

My bigger concern is when the journalled fs has a log on a different
queue.

> Only if the queue is empty can you give a barrier request directly to
> the driver.  The special optimisation you can do in this case with
> SCSI is to continue to allow new requests through even before the
> barrier has completed if the disk supports ordered queue tags.  

Yep, IDE will have to pay the price of a flush.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 03:12:41PM +0100, Jens Axboe wrote:
> 
> Yep, it's much harder than it seems. Especially because for the barrier
> to be really useful, having inter-request dependencies becomes a
> requirement. So you can say something like 'flush X and Y, but don't
> flush Y before X is done'.

Yes.  Fortunately, the simplest possible barrier is just a matter of
marking a request as non-reorderable, and then making sure that you
both flush the elevator queue before servicing that request, and defer
any subsequent requests until the barrier request has been satisfied.
One it has gone through, you can let through the deferred requests (in
order, up to the point at which you encounter another barrier).

Only if the queue is empty can you give a barrier request directly to
the driver.  The special optimisation you can do in this case with
SCSI is to continue to allow new requests through even before the
barrier has completed if the disk supports ordered queue tags.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Linus Torvalds



On Wed, 7 Mar 2001, Jeremy Hansen wrote:
> 
> So in the meantime as this gets worked out on a lower level, we've decided
> to take the fsync() out of berkeley db for mysql transaction logs and
> mount the filesystem -o sync.
> 
> Can anyone perhaps tell me why this may be a bad idea?

Two reasons:
 - it doesn't help. The disk will _still_ do write buffering. It's the
   DISK, not the OS. It doesn't matter what you do.
 - your performance will suck.

Use fsync(). That's what it's there for. 

Tell people who don't have an UPS to disable write caching. If they have
one (of the many, apparently) IDE disks that refuse to disable it, tell
them to either get an UPS, or to switch to another disk.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jeremy Hansen


So in the meantime as this gets worked out on a lower level, we've decided
to take the fsync() out of berkeley db for mysql transaction logs and
mount the filesystem -o sync.

Can anyone perhaps tell me why this may be a bad idea?

Thanks
-jeremy



On Tue, 6 Mar 2001, Jeremy Hansen wrote:

>
> Ahh, now we're getting somewhere.
>
> IDE:
>
> jeremy:~# time ./xlog file.out fsync
>
> real0m33.739s
> user0m0.010s
> sys 0m0.120s
>
>
> so now this corresponds to the performance we're seeing on SCSI.
>
> So I guess what I'm wondering now is can or should anything be done about
> this on the SCSI side?
>
> Thanks
> -jeremy
>
> On Tue, 6 Mar 2001, Mike Black wrote:
>
> > Write caching is the culprit for the performance diff:
> >
> > On IDE:
> > time xlog /blah.dat fsync
> > 0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
> > # hdparm -W 0 /dev/hda
> >
> > /dev/hda:
> >  setting drive write-caching to 0 (off)
> > # time xlog /blah.dat fsync
> > 0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
> > # hdparm -W 1 /dev/hda
> >
> > /dev/hda:
> >  setting drive write-caching to 1 (on)
> > # time xlog /blah.dat fsync
> > 0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w
> >
> > On my SCSI setup:
> > # time xlog /usr5/blah.dat fsync
> > 0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w
> >
> >
> > 
> > Michael D. Black   Principal Engineer
> > [EMAIL PROTECTED]  321-676-2923,x203
> > http://www.csihq.com  Computer Science Innovations
> > http://www.csihq.com/~mike  My home page
> > FAX 321-676-2355
> > - Original Message -
> > From: "Andre Hedrick" <[EMAIL PROTECTED]>
> > To: "Linus Torvalds" <[EMAIL PROTECTED]>
> > Cc: "Douglas Gilbert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> > Sent: Tuesday, March 06, 2001 2:12 AM
> > Subject: Re: scsi vs ide performance on fsync's
> >
> >
> > On Mon, 5 Mar 2001, Linus Torvalds wrote:
> >
> > > Well, it's fairly hard for the kernel to do much about that - it's almost
> > > certainly just IDE doing write buffering on the disk itself. No OS
> > > involved.
> >
> > I am pushing for WC to be defaulted in the off state, but as you know I
> > have a bigger fight than caching on my hands...
> >
> > > I don't know if there is any way to turn of a write buffer on an IDE disk.
> >
> > You want a forced set of commands to kill caching at init?
> >
> > Andre Hedrick
> > Linux ATA Development
> > ASL Kernel Development
> > 
> > -
> > ASL, Inc. Toll free: 1-877-ASL-3535
> > 1757 Houret Court Fax: 1-408-941-2071
> > Milpitas, CA 95035Web: www.aslab.com
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [EMAIL PROTECTED]
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
>
>

-- 
this is my sig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> SCSI certainly lets us do both of these operations independently.  IDE
> has the sync/flush command afaik, but I'm not sure whether the IDE
> tagged command stuff has the equivalent of SCSI's ordered tag bits.
> Andre?

IDE has no concept of ordered tags...

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
> > SCSI has ordered tag, which fit the model Alan described quite nicely.
> > I've been meaning to implement this for some time, it would be handy
> > for journalled fs to use such a barrier. Since ATA doesn't do queueing
> > (at least not in current Linux), a synchronize cache is probably the
> > only way to go there.
> 
> Note that you also have to preserve the position of the barrier in the
> elevator queue, and you need to prevent LVM and soft raid from
> violating the barrier if different commands end up being sent to
> different disks.

Yep, it's much harder than it seems. Especially because for the barrier
to be really useful, having inter-request dependencies becomes a
requirement. So you can say something like 'flush X and Y, but don't
flush Y before X is done'.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Tue, Mar 06, 2001 at 09:37:20PM +0100, Jens Axboe wrote:
> 
> SCSI has ordered tag, which fit the model Alan described quite nicely.
> I've been meaning to implement this for some time, it would be handy
> for journalled fs to use such a barrier. Since ATA doesn't do queueing
> (at least not in current Linux), a synchronize cache is probably the
> only way to go there.

Note that you also have to preserve the position of the barrier in the
elevator queue, and you need to prevent LVM and soft raid from
violating the barrier if different commands end up being sent to
different disks.

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote:

> On Tue, 6 Mar 2001, Alan Cox wrote:
> > You want a write barrier. Write buffering (at least for short intervals) in
> > the drive is very sensible. The kernel needs to able to send drivers a write
> > barrier which will not be completed with outstanding commands before the
> > barrier.
> 
> But Alan is right - we needs a "sync" command or something. I don't know
> if IDE has one (it already might, for all I know).

Sync and barrier are very different models.  With barriers we can
enforce some elemnt of write ordering without actually waiting for the
IOs to complete; with sync, we're explicitly asking to be told when
the data has become persistant.  We can make use of both of these.

SCSI certainly lets us do both of these operations independently.  IDE
has the sync/flush command afaik, but I'm not sure whether the IDE
tagged command stuff has the equivalent of SCSI's ordered tag bits.
Andre?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread David Balazic

Andre Hedrick ([EMAIL PROTECTED]) wrote on Wed Mar 07 2001 - 01:58:44 EST :

> On Wed, 7 Mar 2001, Jonathan Morton wrote: 
 
[ snip ]
 
 
> > >Since all OSes that enable WC at init will flush 
> > >it at shutdown and do a periodic purge with in-activity. 
> > 
> > But Linux doesn't, as has been pointed out earlier. We need to fix Linux. 
> 
> Friend I have fixed this some time ago but it is bundled with TASKFILE 
> that is not going to arrive until 2.5. Because I need a way to execute 
> this and hold the driver until it is complete, regardless of the shutdown 
> method. 

I don't understand 100%.
Is TASKFILE required to do proper write cache flushing ?

> > >Err, last time I check all good devices flush their write caching on their 
> > >own to take advantage of having a maximum cache for prefetching. 
> > 
> > Which doesn't work if the buffer is filled up by the OS 0.5 seconds before 
> > the power goes. 
> 
> Maybe that is why there is a vender disk-cache dump zone on the edge of 
> the platters...just maybe you need to buy your drives from somebody that 
> does this and has a predictive sector stretcher as the energy from the 
> inertia by the DC three-phase motor executes the dump. 

So where is a list of drives that do this ?
www.list-of-hardware-that-doesnt-suck.com is not responding ...
 
> Ever wondered why modern drives have open collectors on the databuss? 

no :-)


-- 
David Balazic
--
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread David Balazic

Andre Hedrick ([EMAIL PROTECTED]) wrote on Wed Mar 07 2001 - 01:58:44 EST :

 On Wed, 7 Mar 2001, Jonathan Morton wrote: 
 
[ snip ]
 
 
  Since all OSes that enable WC at init will flush 
  it at shutdown and do a periodic purge with in-activity. 
  
  But Linux doesn't, as has been pointed out earlier. We need to fix Linux. 
 
 Friend I have fixed this some time ago but it is bundled with TASKFILE 
 that is not going to arrive until 2.5. Because I need a way to execute 
 this and hold the driver until it is complete, regardless of the shutdown 
 method. 

I don't understand 100%.
Is TASKFILE required to do proper write cache flushing ?

  Err, last time I check all good devices flush their write caching on their 
  own to take advantage of having a maximum cache for prefetching. 
  
  Which doesn't work if the buffer is filled up by the OS 0.5 seconds before 
  the power goes. 
 
 Maybe that is why there is a vender disk-cache dump zone on the edge of 
 the platters...just maybe you need to buy your drives from somebody that 
 does this and has a predictive sector stretcher as the energy from the 
 inertia by the DC three-phase motor executes the dump. 

So where is a list of drives that do this ?
www.list-of-hardware-that-doesnt-suck.com is not responding ...
 
 Ever wondered why modern drives have open collectors on the databuss? 

no :-)


-- 
David Balazic
--
"Be excellent to each other." - Bill  Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Tue, Mar 06, 2001 at 10:44:34AM -0800, Linus Torvalds wrote:

 On Tue, 6 Mar 2001, Alan Cox wrote:
  You want a write barrier. Write buffering (at least for short intervals) in
  the drive is very sensible. The kernel needs to able to send drivers a write
  barrier which will not be completed with outstanding commands before the
  barrier.
 
 But Alan is right - we needs a "sync" command or something. I don't know
 if IDE has one (it already might, for all I know).

Sync and barrier are very different models.  With barriers we can
enforce some elemnt of write ordering without actually waiting for the
IOs to complete; with sync, we're explicitly asking to be told when
the data has become persistant.  We can make use of both of these.

SCSI certainly lets us do both of these operations independently.  IDE
has the sync/flush command afaik, but I'm not sure whether the IDE
tagged command stuff has the equivalent of SCSI's ordered tag bits.
Andre?

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
  SCSI has ordered tag, which fit the model Alan described quite nicely.
  I've been meaning to implement this for some time, it would be handy
  for journalled fs to use such a barrier. Since ATA doesn't do queueing
  (at least not in current Linux), a synchronize cache is probably the
  only way to go there.
 
 Note that you also have to preserve the position of the barrier in the
 elevator queue, and you need to prevent LVM and soft raid from
 violating the barrier if different commands end up being sent to
 different disks.

Yep, it's much harder than it seems. Especially because for the barrier
to be really useful, having inter-request dependencies becomes a
requirement. So you can say something like 'flush X and Y, but don't
flush Y before X is done'.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
 SCSI certainly lets us do both of these operations independently.  IDE
 has the sync/flush command afaik, but I'm not sure whether the IDE
 tagged command stuff has the equivalent of SCSI's ordered tag bits.
 Andre?

IDE has no concept of ordered tags...

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jeremy Hansen


So in the meantime as this gets worked out on a lower level, we've decided
to take the fsync() out of berkeley db for mysql transaction logs and
mount the filesystem -o sync.

Can anyone perhaps tell me why this may be a bad idea?

Thanks
-jeremy



On Tue, 6 Mar 2001, Jeremy Hansen wrote:


 Ahh, now we're getting somewhere.

 IDE:

 jeremy:~# time ./xlog file.out fsync

 real0m33.739s
 user0m0.010s
 sys 0m0.120s


 so now this corresponds to the performance we're seeing on SCSI.

 So I guess what I'm wondering now is can or should anything be done about
 this on the SCSI side?

 Thanks
 -jeremy

 On Tue, 6 Mar 2001, Mike Black wrote:

  Write caching is the culprit for the performance diff:
 
  On IDE:
  time xlog /blah.dat fsync
  0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
  # hdparm -W 0 /dev/hda
 
  /dev/hda:
   setting drive write-caching to 0 (off)
  # time xlog /blah.dat fsync
  0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
  # hdparm -W 1 /dev/hda
 
  /dev/hda:
   setting drive write-caching to 1 (on)
  # time xlog /blah.dat fsync
  0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w
 
  On my SCSI setup:
  # time xlog /usr5/blah.dat fsync
  0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w
 
 
  
  Michael D. Black   Principal Engineer
  [EMAIL PROTECTED]  321-676-2923,x203
  http://www.csihq.com  Computer Science Innovations
  http://www.csihq.com/~mike  My home page
  FAX 321-676-2355
  - Original Message -
  From: "Andre Hedrick" [EMAIL PROTECTED]
  To: "Linus Torvalds" [EMAIL PROTECTED]
  Cc: "Douglas Gilbert" [EMAIL PROTECTED]; [EMAIL PROTECTED]
  Sent: Tuesday, March 06, 2001 2:12 AM
  Subject: Re: scsi vs ide performance on fsync's
 
 
  On Mon, 5 Mar 2001, Linus Torvalds wrote:
 
   Well, it's fairly hard for the kernel to do much about that - it's almost
   certainly just IDE doing write buffering on the disk itself. No OS
   involved.
 
  I am pushing for WC to be defaulted in the off state, but as you know I
  have a bigger fight than caching on my hands...
 
   I don't know if there is any way to turn of a write buffer on an IDE disk.
 
  You want a forced set of commands to kill caching at init?
 
  Andre Hedrick
  Linux ATA Development
  ASL Kernel Development
  
  -
  ASL, Inc. Toll free: 1-877-ASL-3535
  1757 Houret Court Fax: 1-408-941-2071
  Milpitas, CA 95035Web: www.aslab.com
 
  -
  To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 
  -
  To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
 



-- 
this is my sig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 03:12:41PM +0100, Jens Axboe wrote:
 
 Yep, it's much harder than it seems. Especially because for the barrier
 to be really useful, having inter-request dependencies becomes a
 requirement. So you can say something like 'flush X and Y, but don't
 flush Y before X is done'.

Yes.  Fortunately, the simplest possible barrier is just a matter of
marking a request as non-reorderable, and then making sure that you
both flush the elevator queue before servicing that request, and defer
any subsequent requests until the barrier request has been satisfied.
One it has gone through, you can let through the deferred requests (in
order, up to the point at which you encounter another barrier).

Only if the queue is empty can you give a barrier request directly to
the driver.  The special optimisation you can do in this case with
SCSI is to continue to allow new requests through even before the
barrier has completed if the disk supports ordered queue tags.  

--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote:
 On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
 
 My bigger concern is when the journalled fs has a log on a different
 queue.

For most fs'es, that's not an issue.  The fs won't start writeback on
the primary disk at all until the journal commit has been acknowledged
as firm on disk.

Certainly for ext3, synchronisation between the log and the primary
disk is no big thing.  What really hurts is writing to the log, where
we have to wait for the log writes to complete before submitting the
commit write (which is sequentially allocated just after the rest of
the log blocks).  Specifying a barrier on the commit block would allow
us to keep the log device streaming, and the fs can deal with
synchronising the primary disk quite happily by itself.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
 On Wed, Mar 07, 2001 at 07:51:52PM +0100, Jens Axboe wrote:
  On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
  
  My bigger concern is when the journalled fs has a log on a different
  queue.
 
 For most fs'es, that's not an issue.  The fs won't start writeback on
 the primary disk at all until the journal commit has been acknowledged
 as firm on disk.

But do you then force wait on that journal commit?

 Certainly for ext3, synchronisation between the log and the primary
 disk is no big thing.  What really hurts is writing to the log, where
 we have to wait for the log writes to complete before submitting the
 commit write (which is sequentially allocated just after the rest of
 the log blocks).  Specifying a barrier on the commit block would allow
 us to keep the log device streaming, and the fs can deal with
 synchronising the primary disk quite happily by itself.

A barrier operation is sufficient then. So you're saying don't
over design, a simple barrier is all you need?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Stephen C. Tweedie

Hi,

On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
 On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
  
  For most fs'es, that's not an issue.  The fs won't start writeback on
  the primary disk at all until the journal commit has been acknowledged
  as firm on disk.
 
 But do you then force wait on that journal commit?

It doesn't matter too much --- it's only the writeback which is doing
this (ext3 uses a separate journal thread for it), so any sleep is
only there to wait for the moment when writeback can safely begin:
users of the filesystem won't see any stalls.

 A barrier operation is sufficient then. So you're saying don't
 over design, a simple barrier is all you need?

Pretty much so.  The simple barrier is the only thing which can be
effectively optimised at the hardware level with SCSI anyway.

Cheers,
 Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-07 Thread Jens Axboe

On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
 On Wed, Mar 07, 2001 at 09:15:36PM +0100, Jens Axboe wrote:
  On Wed, Mar 07 2001, Stephen C. Tweedie wrote:
   
   For most fs'es, that's not an issue.  The fs won't start writeback on
   the primary disk at all until the journal commit has been acknowledged
   as firm on disk.
  
  But do you then force wait on that journal commit?
 
 It doesn't matter too much --- it's only the writeback which is doing
 this (ext3 uses a separate journal thread for it), so any sleep is
 only there to wait for the moment when writeback can safely begin:
 users of the filesystem won't see any stalls.

Ok, but even if this is true for ext3 it may not be true for other
journalled fs. AFAIR, reiser is doing an explicit wait_on_buffer
which would then amount to quite a performance hit (speculation,
haven't measured).

  A barrier operation is sufficient then. So you're saying don't
  over design, a simple barrier is all you need?
 
 Pretty much so.  The simple barrier is the only thing which can be
 effectively optimised at the hardware level with SCSI anyway.

True

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Andre Hedrick

On Wed, 7 Mar 2001, Jonathan Morton wrote:

> Still doesn't make a difference - there is one revolution between writes,
> no matter where on disk it is.

Oh it does, because you are hitting the same sector with the same data.
Rotate your buffer and then you will see the difference.

> >Because of WinBench!
> >All the prefetch/caching are modeled to be optimized to that bench-mark.
> 
> Lies, damn lies, statistics, benchmarks, delivery dates.  Especially a
> consumer-oriented benchmark like WinBench.  It's perfectly natural to
> optimise for particular access patterns, but IMHO that doesn't excuse
> breaking the drive just to get a better benchmark score.

Obviously you have never been in the bowls of drive industry hell.
Why do you think there was a change the ATA-6 to require the
Write-Verify-Read to always return stuff from the platter?
Because the SOB's in storage LIE!  A real wake-up call for you is that
everything about the world of storage is a big-fat-whopper of a LIE.

Storage devices are BLACK-BOXES with the standards/rules to communicate
being dictated by the device not the host.  Storage devices are no beter
then a Coke(tm) vending machine.  You push "Coke" it gives you "Coke".
You have not a clue to how it arrives or where it came from.
Same thing about reading from a drive.

> That isn't the point!  I'm not talking about the physical mechanism, which
> indeed is often the same between one generation of SCSI and the next
> generation of IDE devices.  I'm talking about the IDE controller which is
> slapped on the bottom of said mechanism.  The mech can be of world-class
> quality, but if the controller is shot it doesn't cut the grain.

So there is a $5 differnce in the cell-gates and the line drivers are more
powerful,  80GB ATA + $5 != 80GB SCSI.

> >Since all OSes that enable WC at init will flush
> >it at shutdown and do a periodic purge with in-activity.
> 
> But Linux doesn't, as has been pointed out earlier.  We need to fix Linux.

Friend I have fixed this some time ago but it is bundled with TASKFILE
that is not going to arrive until 2.5.  Because I need a way to execute
this and hold the driver until it is complete, regardless of the shutdown
method.

> >Err, last time I check all good devices flush their write caching on their
> >own to take advantage of having a maximum cache for prefetching.
> 
> Which doesn't work if the buffer is filled up by the OS 0.5 seconds before
> the power goes.

Maybe that is why there is a vender disk-cache dump zone on the edge of
the platters...just maybe you need to buy your drives from somebody that
does this and has a predictive sector stretcher as the energy from the
inertia by the DC three-phase motor executes the dump.

Ever wondered why modern drives have open collectors on the databuss?
Maybe to disconnect the power draw so that the motor now generator
provides the needed power to complete the data dump...

> I'm sorry if this looks like another troll, but I really do like to clear
> up confusion.  I do accept that IDE now has good enough real performance
> for many purposes, but in terms of enforced quality it clearly lags behind
> the entire SCSI field.

I have no desire to debate the merits, but when your onboard host for ATA
starts shipping with GigaBit-Copper speeds then we can have a pissing
contest.

Cheers,

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>I am not going to bite on your flame bate, and are free to waste you money.

I don't flamebait.  I was trying to clear up some confusion...

>No, SCSI does with queuing.
>I am saying that the ata/ide driver rips the heart out of the
>io_request_lock what to darn long.  This means that upon execution a
>request virtually all interrupts are wacked and the drivers in dominating
>the system.  Given that IO's are limited to 128 sectors or one DMA PRD,
>this is vastly smaller than the SCSI trasfer limit.

Ah, so the ATA driver hogs interrupts.  Nice.  Kinda explains why I can't
use the mouse on some systems when I use cdparanoia.

>Okay real shortlimit to two zones that are equal in size.
>The inner and outer, and the latter will cover more physical media than
>the former.  Simple Two zone model.

Still doesn't make a difference - there is one revolution between writes,
no matter where on disk it is.

>> Under those circumstances,
>> I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
>> *regardless* of seeking performance.  Seeking doesn't come into it!
>
>It does, because more RPM means more air-flow and more work to keep the
>position stable.

That's the engineers' problem, not ours.  In fact, it's not really a
problem because my IBM drive gave almost exactly the correct performance
result, even at 1rpm, therefore it's managing to keep the position
stable regardless of airflow.

>> Why does this sound familiar?
>
>Because of WinBench!
>All the prefetch/caching are modeled to be optimized to that bench-mark.

Lies, damn lies, statistics, benchmarks, delivery dates.  Especially a
consumer-oriented benchmark like WinBench.  It's perfectly natural to
optimise for particular access patterns, but IMHO that doesn't excuse
breaking the drive just to get a better benchmark score.

>> Personally, I feel the bottom line is rapidly turning into "if you have
>> critical data, don't put it on an IDE disk".  There are too many corners
>> cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
>> if you like - I realise SCSI is more expensive, but you get what you pay
>> for.
>
>Let me slap you in the face with a salomi stick!
>ATA 7200 RPM Drives are using SCSI 7200 RPM Drive HDA's
>So you say ATA is Lame?  Then so was your SCSI 7200's.

That isn't the point!  I'm not talking about the physical mechanism, which
indeed is often the same between one generation of SCSI and the next
generation of IDE devices.  I'm talking about the IDE controller which is
slapped on the bottom of said mechanism.  The mech can be of world-class
quality, but if the controller is shot it doesn't cut the grain.

>Since all OSes that enable WC at init will flush
>it at shutdown and do a periodic purge with in-activity.

But Linux doesn't, as has been pointed out earlier.  We need to fix Linux.
Also, as I and someone else have also pointed out, there are drives in
circulation which refuse to turn off write caching, including one sitting
in my main workstation - the one which is rebooted the most often, simply
because I need to use Windoze 95 for a few onerous tasks.  I haven't
suffered disk corruption yet, because Linux unmounts the filesystems and
flushes it's own buffers several seconds before powering down, and uses a
non-pathological access pattern, but I sure don't want to see the first
time this doesn't work properly.

>Err, last time I check all good devices flush their write caching on their
>own to take advantage of having a maximum cache for prefetching.

Which doesn't work if the buffer is filled up by the OS 0.5 seconds before
the power goes.

I'm sorry if this looks like another troll, but I really do like to clear
up confusion.  I do accept that IDE now has good enough real performance
for many purposes, but in terms of enforced quality it clearly lags behind
the entire SCSI field.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mark Hahn

> itself is a bad thing, particularly given the amount of CPU overhead that
> IDE drives demand while attached to the controller (orders of magnitude
> higher than a good SCSI controller) - the more overhead we can hand off to

I know this is just a troll by a scsi-believer, but I'm biting anyway.

on current machines and disks, ide costs a few % CPU, depending on 
which CPU, disk, kernel, the sustained bandwidth, etc.  I've measured
this using the now-trendy method of noticing how much the IO costs
a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf).
my cheesy duron/600 desktop typically shows ~2% actual cost when running
bonnie's block IO tests.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jens Axboe

On Tue, Mar 06 2001, David Balazic wrote:
> > > Wrong model 
> > > 
> > > You want a write barrier. Write buffering (at least for short intervals)
> > > in the drive is very sensible. The kernel needs to able to send
> > > drivers a write barrier which will not be completed with outstanding
> > > commands before the 
> > > barrier. 
> > 
> > Agreed. 
> > 
> > Write buffering is incredibly useful on a disk - for all the same reasons 
> > that an OS wants to do it. The disk can use write buffering to speed up 
> > writes a lot - not just lower the _perceived_ latency by the OS, but to 
> > actually improve performance too. 
> > 
> > But Alan is right - we needs a "sync" command or something. I don't know 
> > if IDE has one (it already might, for all I know). 
> 
> ATA , SCSI and ATAPI all have a FLUSH_CACHE command. (*)
> Whether the drives implement it is another question ...

(Usually called SYNCHRONIZE_CACHE btw)

SCSI has ordered tag, which fit the model Alan described quite nicely.
I've been meaning to implement this for some time, it would be handy
for journalled fs to use such a barrier. Since ATA doesn't do queueing
(at least not in current Linux), a synchronize cache is probably the
only way to go there.

> (*) references : 
>   ATA-6 draft standard from www.t13.org
>   MtFuji document from 

ftp.avc-pioneer.com

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread David Balazic

Linus Torvalds himself wrote :

> On Tue, 6 Mar 2001, Alan Cox wrote: 
> > 
> > > > I don't know if there is any way to turn of a write buffer on an IDE disk. 
> > > You want a forced set of commands to kill caching at init? 
> > 
> > Wrong model 
> > 
> > You want a write barrier. Write buffering (at least for short intervals) in 
> > the drive is very sensible. The kernel needs to able to send drivers a write 
> > barrier which will not be completed with outstanding commands before the 
> > barrier. 
> 
> Agreed. 
> 
> Write buffering is incredibly useful on a disk - for all the same reasons 
> that an OS wants to do it. The disk can use write buffering to speed up 
> writes a lot - not just lower the _perceived_ latency by the OS, but to 
> actually improve performance too. 
> 
> But Alan is right - we needs a "sync" command or something. I don't know 
> if IDE has one (it already might, for all I know). 

ATA , SCSI and ATAPI all have a FLUSH_CACHE command. (*)
Whether the drives implement it is another question ...

(*) references : 
  ATA-6 draft standard from www.t13.org
  MtFuji document from 


-- 
David Balazic
--
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Andre Hedrick


Jonathan,

I am not going to bite on your flame bate, and are free to waste you money.

On Tue, 6 Mar 2001, Jonathan Morton wrote:

> >> It's pretty clear that the IDE drive(r) is *not* waiting for the physical
> >> write to take place before returning control to the user program, whereas
> >> the SCSI drive(r) is.  Both devices appear to be performing the write
> >
> >Wrong, IDE does not unplug thus the request is almost, I hate to admit it
> >SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
> >the driver lets go.
> 
> Uh, run that past me again?  You are saying that because the IDE drive hogs
> the bus until the write is complete or the driver forcibly disconnects, you
> make the driver disconnect to save time?  Or (more likely) have I totally
> misread you...

No, SCSI does with queuing.
I am saying that the ata/ide driver rips the heart out of the
io_request_lock what to darn long.  This means that upon execution a
request virtually all interrupts are wacked and the drivers in dominating
the system.  Given that IO's are limited to 128 sectors or one DMA PRD,
this is vastly smaller than the SCSI trasfer limit.

Since you are not using the test "Write Verify Read" all drives are going
to lie.  Only this command will force the stuff to hit the platters and
return a read out of the dirty-cache.

> >pre-seek.  Thus the question is were is the drive leaving the heads when
> >not active?  It does not appear to be in the zone 1 region.
> 
> Duh...  I don't quite see what you're saying here, either.  The test is a

Okay real shortlimit to two zones that are equal in size.
The inner and outer, and the latter will cover more physical media than
the former.  Simple Two zone model.

> continuous rewrite of the same sector of the disk, so the head shouldn't be
> moving *at all* until it's all over.  In addition, the drive can't start

True and you slip a rev. everytime.

> writing the sector when it's just finished writing it, so it has to wait
> for the rotation to breing it back round again.  Under those circumstances,
> I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
> *regardless* of seeking performance.  Seeking doesn't come into it!

It does, because more RPM means more air-flow and more work to keep the
position stable.

> >Thus if your drive is one of those that does a stress test check that goes:
> >"this bozo did not really mean to turn off write caching, renabling "
> 
> Why does this sound familiar?

Because of WinBench!
All the prefetch/caching are modeled to be optimized to that bench-mark.

> Personally, I feel the bottom line is rapidly turning into "if you have
> critical data, don't put it on an IDE disk".  There are too many corners
> cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
> if you like - I realise SCSI is more expensive, but you get what you pay
> for.

Let me slap you in the face with a salomi stick!
ATA 7200 RPM Drives are using SCSI 7200 RPM Drive HDA's
So you say ATA is Lame?  Then so was your SCSI 7200's.

> Of course, under normal circumstances, you leave write-caching and UDMA on,
> and you don't use a pathological stress-test like we've been doing.  That
> gives the best performance.  But sometimes it's necessary to use these
> "pathological" access patterns to achieve certain system functions.
> Suppose, harking back to the Windows data-corruption scenario mentioned
> earlier, that just before powering off you stuffed several MB of data,
> scattered across the disk, into said disk and waited for said disk to say
> "yup, i've got that", then powered down.  Recent drives have very large
> (2MB?) on-board caches, so how long does it take for a pathological pattern
> of these to be committed to physical media?  Can the drive sustain it's own
> power long enough to do this (highly unlikely)?  So the drive *must* be
> able to tell the OS when it's actually committed the data to media, or risk
> *serious* data corruption.

OH...you are talking about the one IBM drive that is goat-screwed...
The one that is to stupid to use the energy of the platters to drop the
data in the vender power down strip...yet it dumps the buffer in a panic..

ERM, that is a bad drive, regardless if they publish an errata that states
only good HOSTS that issue a flush-cache prior to power are to be
certified...we maybe if they did not default the WC on then it would be a
NOP of the design error.  Since all OSes that enable WC at init will flush
it at shutdown and do a periodic purge with in-activity.

> Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
> IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
> ends of the disk, working inwards until the buffer is full.  512-byte
> sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
> including rotational delay, either).  Last time I checked, you'd need a
> capacitor array the size of the entire computer case to store 

Re: scsi vs ide performance on fsync's

2001-03-06 Thread Linus Torvalds



On Tue, 6 Mar 2001, Alan Cox wrote:
>
> > > I don't know if there is any way to turn of a write buffer on an IDE disk.
> > You want a forced set of commands to kill caching at init?
> 
> Wrong model
> 
> You want a write barrier. Write buffering (at least for short intervals) in
> the drive is very sensible. The kernel needs to able to send drivers a write
> barrier which will not be completed with outstanding commands before the
> barrier.

Agreed.

Write buffering is incredibly useful on a disk - for all the same reasons
that an OS wants to do it. The disk can use write buffering to speed up
writes a lot - not just lower the _perceived_ latency by the OS, but to
actually improve performance too.

But Alan is right - we needs a "sync" command or something. I don't know
if IDE has one (it already might, for all I know).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>Jonathan Morton ([EMAIL PROTECTED]) wrote :
>
>> The OS needs to know the physical act of writing data has finished
>>before
>> it tells the m/board to cut the power - period. Pathological data sets
>> included - they are the worst case which every engineer must take into
>> account. Out of interest, does Linux guarantee this, in the light of what
>> we've uncovered? If so, perhaps it could use the same technique to fix
>> fdatasync() and family...
>
>Linux currently ignores write-cache, AFAICT.
>Recently I asked a similar question , about flushing drive caches at
>shutdown :

>On Mon, Feb 19, 2001 at 01:45:57PM +0100, David Balazic wrote:

>> It is a good idea IMO to flush the write cache of storage devices
>> at shutdown and other critical moments.
>
>Not needed. All device drivers should disable write caches of
>their devices, that need another signal than switching it off by
>the power button to flush themselves.

Sounds like a sensible place to implement it - in the device driver.  I
also note the existence of an ATA flush-buffer command, which should
probably be used in sync() and family.  The call(s) to the sync() family on
shutdown should probably be performed by the filesystem itself on unmount
(or remount read-only), and if journalled filesystems need synchronisation
they should use sync() (or a more fine-grained version) themselves as
necessary.

Doesn't sound like too much of a headache to implement, to me - unless some
drives ignore the ATA FLUSH command, in which case said drives can be
considered seriously broken.  :P  I don't agree that write-caching in
itself is a bad thing, particularly given the amount of CPU overhead that
IDE drives demand while attached to the controller (orders of magnitude
higher than a good SCSI controller) - the more overhead we can hand off to
dedicated hardware, the better.  What does matter is that drives
implementing write-caching are handled in a safe and efficient manner,
especially in cases where they refuse to turn such caching off (eg. my
Seagate Barracuda *glares at drive*).

Recalling my recent comments on worst-case drive-shutdown timings, I also
remember seeing drives with 18ms *average* seek times quite recently - this
was a Quantum Bigfoot (yes, a 5.25" HD), found in a low-end Compaq desktop
- if anyone still believes Compaq makes high-quality machines for their
low-end market, they're totally mistaken.  The machine sped up quite a lot
when a new 3.5" IBM DeskStar was installed, with an 8.5ms average seek and
an almost doubling in rotational speed.  :)

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Gregory Maxwell

On Tue, Mar 06, 2001 at 06:14:15PM +0100, David Balazic wrote:
[snip]
> Hardware Level caching is only good for OSes which have broken
> drivers and broken caching (like plain old DOS).
> 
> Linux does a good job in caching and cache control at software
> level.

Read caching, yes. But for writes, the drive can often do a lot more
optimization because of it's synchronous operation with the platter and
greater knowledge of internal disk geometry.

What would be useful, as Alan said, is a barrier operation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread David Balazic

(( please CC me , not subscribed , [EMAIL PROTECTED] )

Jonathan Morton ([EMAIL PROTECTED]) wrote :

> The OS needs to know the physical act of writing data has finished before 
> it tells the m/board to cut the power - period. Pathological data sets  
> included - they are the worst case which every engineer must take into  
> account. Out of interest, does Linux guarantee this, in the light of what
> we've uncovered? If so, perhaps it could use the same technique to fix
> fdatasync() and family...

Linux currently ignores write-cache, AFAICT.
Recently I asked a similar question , about flushing drive caches at shutdown :
Subject : "Flusing caches on shutdown"
message archived at :
http://boudicca.tux.org/hypermail/linux-kernel/2001week08/0157.html
Body attached at end of this message.

The answer ( and only reply ) was :
[ archived at : http://boudicca.tux.org/hypermail/linux-kernel/2001week08/0211.html ]
--- begin quote ---
From: Ingo Oeser ([EMAIL PROTECTED])

On Mon, Feb 19, 2001 at 01:45:57PM +0100, David Balazic wrote:
> It is a good idea IMO to flush the write cache of storage devices 
> at shutdown and other critical moments. 
 
Not needed. All device drivers should disable write caches of
their devices, that need another signal than switching it off by  
the power button to flush themselves.
 
> Loosing data at powerdown due to write caches have been reported, 
> so this is no a theoretical problems. Also the journaled filesystems 
> are safe only in theory if the journal is not stored on non-volatile 
> memory, which is not guarantied in the current kernel. 
 
Fine. If users/admins have write caching enabled, they either
know what they do, or should disable it (which is the default for
all mass storage drivers AFAIK).
 
Hardware Level caching is only good for OSes which have broken
drivers and broken caching (like plain old DOS).

Linux does a good job in caching and cache control at software
level.
 
Regards

Ingo Oeser
--- end quote ---

My original mail :
--- begin quote ---
   (( CC me the replies, as I'm not subscribed to LKML ))

   Hi! 
 
   It is a good idea IMO to flush the write cache of storage devices
   at shutdown and other critical moments.
   I browsed through linux-2.4.1 and see no use of the SYNCHRONIZE CACHE
   SCSI command ( curiously it is defined in several other files
   besides include/scsi/scsi.h , grep returns :
   drivers/scsi/pci2000.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   drivers/scsi/psi_dale.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   drivers/scsi/psi240i.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   ) 

   I couldn't find evidence to the use of the equivalent ATA command either
   ( FLUSH CACHE , command code E7h ).
   Also add ATAPI to the list. ( and all other interfaces. I checked just SCSI
   and ATA )

   Loosing data at powerdown due to write caches have been reported,
   so this is no a theoretical problems. Also the journaled filesystems
   are safe only in theory if the journal is not stored on non-volatile 
   memory, which is not guarantied in the current kernel.

   What is the official word on this issue ?
   I think this is important to the "enterprise" guys, at the least.
   
   Sincerely,
   david 
   
   PS: CC me , as I'm not subscribed to LKML
--- end quote ---

-- 
David Balazic
--
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -

-- 
David Balazic
--
"Be excellent to each other." - Bill & Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>On Tue, 6 Mar 2001, Mike Black wrote:
>
>> Write caching is the culprit for the performance diff:

Indeed, and my during-the-boring-lecture benchmark on my 18Gb IBM
TravelStar bears this out.  I was confused earlier by the fact that one of
my Seagate drives blatently ignores the no-write-caching request I sent it.
:P


At 4:02 pm + 6/3/2001, Jeremy Hansen wrote:

>Ahh, now we're getting somewhere.

>so now this corresponds to the performance we're seeing on SCSI.
>
>So I guess what I'm wondering now is can or should anything be done about
>this on the SCSI side?

Maybe, it depends on your perspective.  In my personal opinion, the IDE
behaviour is incorrect and some way of dealing with it (while still
retaining the benefits of write-caching for normal applications) would be
highly desirable.  However, some applications may like or partially rely on
that behaviour, to gain better on-disk data consistency while not suffering
too much in performance (eg. the transaction database mentioned by at least
one poster).

The way to make all parties happy is to fix the IDE driver (or drives!) and
make sure an *alternative* syscall is available which flushes the buffers
asynchronously, as per the current IDE behaviour.  It shouldn't be too hard
to make the SCSI driver use that behaviour in the alternative syscall
(which may already exist, I don't know Linux well enough to say).

May this be a warning to all hardware manufacturers who "tweak" their
hardware to gain better benchmark results without actually increasing
performance - you *will* be found out!

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jeremy Hansen


Ahh, now we're getting somewhere.

IDE:

jeremy:~# time ./xlog file.out fsync

real0m33.739s
user0m0.010s
sys 0m0.120s


so now this corresponds to the performance we're seeing on SCSI.

So I guess what I'm wondering now is can or should anything be done about
this on the SCSI side?

Thanks
-jeremy

On Tue, 6 Mar 2001, Mike Black wrote:

> Write caching is the culprit for the performance diff:
>
> On IDE:
> time xlog /blah.dat fsync
> 0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
> # hdparm -W 0 /dev/hda
>
> /dev/hda:
>  setting drive write-caching to 0 (off)
> # time xlog /blah.dat fsync
> 0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
> # hdparm -W 1 /dev/hda
>
> /dev/hda:
>  setting drive write-caching to 1 (on)
> # time xlog /blah.dat fsync
> 0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w
>
> On my SCSI setup:
> # time xlog /usr5/blah.dat fsync
> 0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w
>
>
> 
> Michael D. Black   Principal Engineer
> [EMAIL PROTECTED]  321-676-2923,x203
> http://www.csihq.com  Computer Science Innovations
> http://www.csihq.com/~mike  My home page
> FAX 321-676-2355
> - Original Message -
> From: "Andre Hedrick" <[EMAIL PROTECTED]>
> To: "Linus Torvalds" <[EMAIL PROTECTED]>
> Cc: "Douglas Gilbert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Tuesday, March 06, 2001 2:12 AM
> Subject: Re: scsi vs ide performance on fsync's
>
>
> On Mon, 5 Mar 2001, Linus Torvalds wrote:
>
> > Well, it's fairly hard for the kernel to do much about that - it's almost
> > certainly just IDE doing write buffering on the disk itself. No OS
> > involved.
>
> I am pushing for WC to be defaulted in the off state, but as you know I
> have a bigger fight than caching on my hands...
>
> > I don't know if there is any way to turn of a write buffer on an IDE disk.
>
> You want a forced set of commands to kill caching at init?
>
> Andre Hedrick
> Linux ATA Development
> ASL Kernel Development
> 
> -
> ASL, Inc. Toll free: 1-877-ASL-3535
> 1757 Houret Court Fax: 1-408-941-2071
> Milpitas, CA 95035Web: www.aslab.com
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
this is my sig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>> Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
>> IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
>> ends of the disk, working inwards until the buffer is full.  512-byte
>> sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
>> including rotational delay, either).  Last time I checked, you'd need a
>> capacitor array the size of the entire computer case to store enough power
>> to allow the drive to do this after system shutdown, and I don't remember
>> seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
>> sane OS doesn't actually use that kind of write pattern on shutdown, but
>> the drive can't assume that.
>
>But since the drive has everything in cache, it can just write
>out both bunches of sectors in an order which minimises disk
>seek time ...
>
>(yes, the drives don't guarantee write ordering either, but that
>shouldn't come as a big surprise when they don't guarantee that
>data makes it to disk ;))

That would be true for SCSI devices - I understand the controllers and/or
drives support "scatter-gather" which allows a drive to optimise it's seek
pattern in the manner you describe.  However, I'm not sure whether an IDE
drive is allowed to do this.  I'm reasonably sure that I heard somewhere
that IDE drives have to complete transactions in the specified order as far
as the host is concerned - what I'm unsure of is whether this also applies
to mechanical head movement.

If not, then the drive could by all means optimise the access pattern
provided it acked the data or provided the results in the same order as the
instructions were given.  This would probably shorten the time for a new
pathological set (distributed evenly across the disk surface, but all on
the worst-possible angular offset compared to the previous) to (8ms seek
time + 5ms rotational delay) * 4000 writes ~= 52 seconds (compared with
around 120 seconds for the previous set with rotational delay factored in).
Great, so you only need half as big a power store to guarantee writing that
much data, but it's still too much.  Even with a 15000rpm drive and 5ms
seek times, it would still be too much.

The OS needs to know the physical act of writing data has finished before
it tells the m/board to cut the power - period.  Pathological data sets
included - they are the worst case which every engineer must take into
account.  Out of interest, does Linux guarantee this, in the light of what
we've uncovered?  If so, perhaps it could use the same technique to fix
fdatasync() and family...

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mike Black

Write caching is the culprit for the performance diff:

On IDE:
time xlog /blah.dat fsync
0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
# hdparm -W 0 /dev/hda

/dev/hda:
 setting drive write-caching to 0 (off)
# time xlog /blah.dat fsync
0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
# hdparm -W 1 /dev/hda

/dev/hda:
 setting drive write-caching to 1 (on)
# time xlog /blah.dat fsync
0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w

On my SCSI setup:
# time xlog /usr5/blah.dat fsync
0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w



Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: "Andre Hedrick" <[EMAIL PROTECTED]>
To: "Linus Torvalds" <[EMAIL PROTECTED]>
Cc: "Douglas Gilbert" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, March 06, 2001 2:12 AM
Subject: Re: scsi vs ide performance on fsync's


On Mon, 5 Mar 2001, Linus Torvalds wrote:

> Well, it's fairly hard for the kernel to do much about that - it's almost
> certainly just IDE doing write buffering on the disk itself. No OS
> involved.

I am pushing for WC to be defaulted in the off state, but as you know I
have a bigger fight than caching on my hands...

> I don't know if there is any way to turn of a write buffer on an IDE disk.

You want a forced set of commands to kill caching at init?

Andre Hedrick
Linux ATA Development
ASL Kernel Development

-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>> i assume you meant to time the xlog.c program?  (or did i miss another
>> program on the thread?)

Yes.

>> i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
>> *something* with the write cache flag -- it gets 0.10s elapsed real time
>> in default config; and gets 2.91s if i do "hdparm -W 0".
>>
>> ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
>> with write-cache to 1.8s without.
>>
>> and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.
>>
>> of course 1.8s is nowhere near enough time for 200 writes to complete.
>
>hi, not enough sleep, can't do math.  1.67s is exactly the ballpark you'd
>expect for 200 writes to a correctly functioning 7200rpm disk.  and the
>travelstar appears to be doing the right thing as well.

I was just about to point that out.  :)  I ran the program with 2000
packets in order to magnify the difference.

So, it appears that the IBM IDE drives are doing the "right thing" when
write-caching is switched off, but the Seagate drive (at least the one I'm
using) appears not to turn the write-caching off at all.  I want to try
this out with some other drives, including a Seagate SCSI drive and a
different Seagate IDE drive (attached to a non-UDMA controller), and
perhaps a couple of older drives which I just happen to have lying around
(particularly a Maxtor and an old TravelStar with very little cache).
That'll have to wait until later, though - university work beckons.  :(

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Rik van Riel

On Tue, 6 Mar 2001, Jonathan Morton wrote:

> Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
> IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
> ends of the disk, working inwards until the buffer is full.  512-byte
> sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
> including rotational delay, either).  Last time I checked, you'd need a
> capacitor array the size of the entire computer case to store enough power
> to allow the drive to do this after system shutdown, and I don't remember
> seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
> sane OS doesn't actually use that kind of write pattern on shutdown, but
> the drive can't assume that.

But since the drive has everything in cache, it can just write
out both bunches of sectors in an order which minimises disk
seek time ...

(yes, the drives don't guarantee write ordering either, but that
shouldn't come as a big surprise when they don't guarantee that
data makes it to disk ;))

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread dean gaudet

On Tue, 6 Mar 2001, dean gaudet wrote:

> i assume you meant to time the xlog.c program?  (or did i miss another
> program on the thread?)
>
> i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
> *something* with the write cache flag -- it gets 0.10s elapsed real time
> in default config; and gets 2.91s if i do "hdparm -W 0".
>
> ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
> with write-cache to 1.8s without.
>
> and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.
>
> of course 1.8s is nowhere near enough time for 200 writes to complete.

hi, not enough sleep, can't do math.  1.67s is exactly the ballpark you'd
expect for 200 writes to a correctly functioning 7200rpm disk.  and the
travelstar appears to be doing the right thing as well.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread dean gaudet

On Tue, 6 Mar 2001, Jonathan Morton wrote:

> Pathological shutdown pattern:  assuming scatter-gather is not allowed
> (for IDE), and a 20ms full-stroke seek, write sectors at alternately
> opposite ends of the disk, working inwards until the buffer is full.
> 512-byte sectors, 2MB of them, is 4000 writes * 20ms = around 80
> seconds

i don't understand why the disk couldn't elevator in this case and be done
in 20ms + rotational.

> >Of course, whether you should even trust the harddisk is another question.
>
> I think this result in itself would lead me *not* to trust the hard disk,
> especially an IDE one.  Has anybody tried running this test with a recent
> IBM DeskStar - one of the ones that is the same mech as the equivalent
> UltraStar but with IDE controller?

i assume you meant to time the xlog.c program?  (or did i miss another
program on the thread?)

i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
*something* with the write cache flag -- it gets 0.10s elapsed real time
in default config; and gets 2.91s if i do "hdparm -W 0".

ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
with write-cache to 1.8s without.

and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.

of course 1.8s is nowhere near enough time for 200 writes to complete.

so who knows what that flag is doing.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Alan Cox

> > I don't know if there is any way to turn of a write buffer on an IDE disk.
> You want a forced set of commands to kill caching at init?

Wrong model

You want a write barrier. Write buffering (at least for short intervals) in
the drive is very sensible. The kernel needs to able to send drivers a write
barrier which will not be completed with outstanding commands before the
barrier.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

>> It's pretty clear that the IDE drive(r) is *not* waiting for the physical
>> write to take place before returning control to the user program, whereas
>> the SCSI drive(r) is.  Both devices appear to be performing the write
>
>Wrong, IDE does not unplug thus the request is almost, I hate to admit it
>SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
>the driver lets go.

Uh, run that past me again?  You are saying that because the IDE drive hogs
the bus until the write is complete or the driver forcibly disconnects, you
make the driver disconnect to save time?  Or (more likely) have I totally
misread you...

>> immediately, however (judging from the device activity lights).  Whether
>> this is the correct behaviour or not, I leave up to you kernel hackers...
>
>Seagate has a better seek profile than ibm.
>The second access is correct because the first one pushed the heads to the
>pre-seek.  Thus the question is were is the drive leaving the heads when
>not active?  It does not appear to be in the zone 1 region.

Duh...  I don't quite see what you're saying here, either.  The test is a
continuous rewrite of the same sector of the disk, so the head shouldn't be
moving *at all* until it's all over.  In addition, the drive can't start
writing the sector when it's just finished writing it, so it has to wait
for the rotation to breing it back round again.  Under those circumstances,
I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
*regardless* of seeking performance.  Seeking doesn't come into it!

>> IMHO, if an application needs performance, it shouldn't be syncing disks
>> after every write.  Syncing means, in my book, "wait for the data to be
>> committed to physical media" - note the *wait* involved there - so syncing
>> should only be used where data integrity in the event of a system failure
>> has a much higher importance than performance.
>
>I have only gotten the drive makers in the past 6 months to committee to
>actively updating the contents of the identify page to reflect reality.
>Thus if your drive is one of those that does a stress test check that goes:
>"this bozo did not really mean to turn off write caching, renabling "

Why does this sound familiar?

Personally, I feel the bottom line is rapidly turning into "if you have
critical data, don't put it on an IDE disk".  There are too many corners
cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
if you like - I realise SCSI is more expensive, but you get what you pay
for.

Of course, under normal circumstances, you leave write-caching and UDMA on,
and you don't use a pathological stress-test like we've been doing.  That
gives the best performance.  But sometimes it's necessary to use these
"pathological" access patterns to achieve certain system functions.
Suppose, harking back to the Windows data-corruption scenario mentioned
earlier, that just before powering off you stuffed several MB of data,
scattered across the disk, into said disk and waited for said disk to say
"yup, i've got that", then powered down.  Recent drives have very large
(2MB?) on-board caches, so how long does it take for a pathological pattern
of these to be committed to physical media?  Can the drive sustain it's own
power long enough to do this (highly unlikely)?  So the drive *must* be
able to tell the OS when it's actually committed the data to media, or risk
*serious* data corruption.

Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
ends of the disk, working inwards until the buffer is full.  512-byte
sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
including rotational delay, either).  Last time I checked, you'd need a
capacitor array the size of the entire computer case to store enough power
to allow the drive to do this after system shutdown, and I don't remember
seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
sane OS doesn't actually use that kind of write pattern on shutdown, but
the drive can't assume that.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

 It's pretty clear that the IDE drive(r) is *not* waiting for the physical
 write to take place before returning control to the user program, whereas
 the SCSI drive(r) is.  Both devices appear to be performing the write

Wrong, IDE does not unplug thus the request is almost, I hate to admit it
SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
the driver lets go.

Uh, run that past me again?  You are saying that because the IDE drive hogs
the bus until the write is complete or the driver forcibly disconnects, you
make the driver disconnect to save time?  Or (more likely) have I totally
misread you...

 immediately, however (judging from the device activity lights).  Whether
 this is the correct behaviour or not, I leave up to you kernel hackers...

Seagate has a better seek profile than ibm.
The second access is correct because the first one pushed the heads to the
pre-seek.  Thus the question is were is the drive leaving the heads when
not active?  It does not appear to be in the zone 1 region.

Duh...  I don't quite see what you're saying here, either.  The test is a
continuous rewrite of the same sector of the disk, so the head shouldn't be
moving *at all* until it's all over.  In addition, the drive can't start
writing the sector when it's just finished writing it, so it has to wait
for the rotation to breing it back round again.  Under those circumstances,
I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
*regardless* of seeking performance.  Seeking doesn't come into it!

 IMHO, if an application needs performance, it shouldn't be syncing disks
 after every write.  Syncing means, in my book, "wait for the data to be
 committed to physical media" - note the *wait* involved there - so syncing
 should only be used where data integrity in the event of a system failure
 has a much higher importance than performance.

I have only gotten the drive makers in the past 6 months to committee to
actively updating the contents of the identify page to reflect reality.
Thus if your drive is one of those that does a stress test check that goes:
"this bozo did not really mean to turn off write caching, renabling smurk"

Why does this sound familiar?

Personally, I feel the bottom line is rapidly turning into "if you have
critical data, don't put it on an IDE disk".  There are too many corners
cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
if you like - I realise SCSI is more expensive, but you get what you pay
for.

Of course, under normal circumstances, you leave write-caching and UDMA on,
and you don't use a pathological stress-test like we've been doing.  That
gives the best performance.  But sometimes it's necessary to use these
"pathological" access patterns to achieve certain system functions.
Suppose, harking back to the Windows data-corruption scenario mentioned
earlier, that just before powering off you stuffed several MB of data,
scattered across the disk, into said disk and waited for said disk to say
"yup, i've got that", then powered down.  Recent drives have very large
(2MB?) on-board caches, so how long does it take for a pathological pattern
of these to be committed to physical media?  Can the drive sustain it's own
power long enough to do this (highly unlikely)?  So the drive *must* be
able to tell the OS when it's actually committed the data to media, or risk
*serious* data corruption.

Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
ends of the disk, working inwards until the buffer is full.  512-byte
sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
including rotational delay, either).  Last time I checked, you'd need a
capacitor array the size of the entire computer case to store enough power
to allow the drive to do this after system shutdown, and I don't remember
seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
sane OS doesn't actually use that kind of write pattern on shutdown, but
the drive can't assume that.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Alan Cox

  I don't know if there is any way to turn of a write buffer on an IDE disk.
 You want a forced set of commands to kill caching at init?

Wrong model

You want a write barrier. Write buffering (at least for short intervals) in
the drive is very sensible. The kernel needs to able to send drivers a write
barrier which will not be completed with outstanding commands before the
barrier.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread dean gaudet

On Tue, 6 Mar 2001, Jonathan Morton wrote:

 Pathological shutdown pattern:  assuming scatter-gather is not allowed
 (for IDE), and a 20ms full-stroke seek, write sectors at alternately
 opposite ends of the disk, working inwards until the buffer is full.
 512-byte sectors, 2MB of them, is 4000 writes * 20ms = around 80
 seconds

i don't understand why the disk couldn't elevator in this case and be done
in 20ms + rotational.

 Of course, whether you should even trust the harddisk is another question.

 I think this result in itself would lead me *not* to trust the hard disk,
 especially an IDE one.  Has anybody tried running this test with a recent
 IBM DeskStar - one of the ones that is the same mech as the equivalent
 UltraStar but with IDE controller?

i assume you meant to time the xlog.c program?  (or did i miss another
program on the thread?)

i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
*something* with the write cache flag -- it gets 0.10s elapsed real time
in default config; and gets 2.91s if i do "hdparm -W 0".

ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
with write-cache to 1.8s without.

and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.

of course 1.8s is nowhere near enough time for 200 writes to complete.

so who knows what that flag is doing.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread dean gaudet

On Tue, 6 Mar 2001, dean gaudet wrote:

 i assume you meant to time the xlog.c program?  (or did i miss another
 program on the thread?)

 i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
 *something* with the write cache flag -- it gets 0.10s elapsed real time
 in default config; and gets 2.91s if i do "hdparm -W 0".

 ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
 with write-cache to 1.8s without.

 and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.

 of course 1.8s is nowhere near enough time for 200 writes to complete.

hi, not enough sleep, can't do math.  1.67s is exactly the ballpark you'd
expect for 200 writes to a correctly functioning 7200rpm disk.  and the
travelstar appears to be doing the right thing as well.

-dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Rik van Riel

On Tue, 6 Mar 2001, Jonathan Morton wrote:

 Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
 IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
 ends of the disk, working inwards until the buffer is full.  512-byte
 sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
 including rotational delay, either).  Last time I checked, you'd need a
 capacitor array the size of the entire computer case to store enough power
 to allow the drive to do this after system shutdown, and I don't remember
 seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
 sane OS doesn't actually use that kind of write pattern on shutdown, but
 the drive can't assume that.

But since the drive has everything in cache, it can just write
out both bunches of sectors in an order which minimises disk
seek time ...

(yes, the drives don't guarantee write ordering either, but that
shouldn't come as a big surprise when they don't guarantee that
data makes it to disk ;))

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

 i assume you meant to time the xlog.c program?  (or did i miss another
 program on the thread?)

Yes.

 i've an IBM-DJSA-210 (travelstar 10GB, 5411rpm) which appears to do
 *something* with the write cache flag -- it gets 0.10s elapsed real time
 in default config; and gets 2.91s if i do "hdparm -W 0".

 ditto for an IBM-DTLA-307015 (deskstar 15GB 7200rpm) -- varies from .15s
 with write-cache to 1.8s without.

 and an IBM-DTLA-307075 (deskstar 75GB 7200rpm) varies from .03s to 1.67s.

 of course 1.8s is nowhere near enough time for 200 writes to complete.

hi, not enough sleep, can't do math.  1.67s is exactly the ballpark you'd
expect for 200 writes to a correctly functioning 7200rpm disk.  and the
travelstar appears to be doing the right thing as well.

I was just about to point that out.  :)  I ran the program with 2000
packets in order to magnify the difference.

So, it appears that the IBM IDE drives are doing the "right thing" when
write-caching is switched off, but the Seagate drive (at least the one I'm
using) appears not to turn the write-caching off at all.  I want to try
this out with some other drives, including a Seagate SCSI drive and a
different Seagate IDE drive (attached to a non-UDMA controller), and
perhaps a couple of older drives which I just happen to have lying around
(particularly a Maxtor and an old TravelStar with very little cache).
That'll have to wait until later, though - university work beckons.  :(

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mike Black

Write caching is the culprit for the performance diff:

On IDE:
time xlog /blah.dat fsync
0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
# hdparm -W 0 /dev/hda

/dev/hda:
 setting drive write-caching to 0 (off)
# time xlog /blah.dat fsync
0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
# hdparm -W 1 /dev/hda

/dev/hda:
 setting drive write-caching to 1 (on)
# time xlog /blah.dat fsync
0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w

On my SCSI setup:
# time xlog /usr5/blah.dat fsync
0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w



Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: "Andre Hedrick" [EMAIL PROTECTED]
To: "Linus Torvalds" [EMAIL PROTECTED]
Cc: "Douglas Gilbert" [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, March 06, 2001 2:12 AM
Subject: Re: scsi vs ide performance on fsync's


On Mon, 5 Mar 2001, Linus Torvalds wrote:

 Well, it's fairly hard for the kernel to do much about that - it's almost
 certainly just IDE doing write buffering on the disk itself. No OS
 involved.

I am pushing for WC to be defaulted in the off state, but as you know I
have a bigger fight than caching on my hands...

 I don't know if there is any way to turn of a write buffer on an IDE disk.

You want a forced set of commands to kill caching at init?

Andre Hedrick
Linux ATA Development
ASL Kernel Development

-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

 Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
 IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
 ends of the disk, working inwards until the buffer is full.  512-byte
 sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
 including rotational delay, either).  Last time I checked, you'd need a
 capacitor array the size of the entire computer case to store enough power
 to allow the drive to do this after system shutdown, and I don't remember
 seeing LiIon batteries strapped to the bottom of my HDs.  Admittedly, any
 sane OS doesn't actually use that kind of write pattern on shutdown, but
 the drive can't assume that.

But since the drive has everything in cache, it can just write
out both bunches of sectors in an order which minimises disk
seek time ...

(yes, the drives don't guarantee write ordering either, but that
shouldn't come as a big surprise when they don't guarantee that
data makes it to disk ;))

That would be true for SCSI devices - I understand the controllers and/or
drives support "scatter-gather" which allows a drive to optimise it's seek
pattern in the manner you describe.  However, I'm not sure whether an IDE
drive is allowed to do this.  I'm reasonably sure that I heard somewhere
that IDE drives have to complete transactions in the specified order as far
as the host is concerned - what I'm unsure of is whether this also applies
to mechanical head movement.

If not, then the drive could by all means optimise the access pattern
provided it acked the data or provided the results in the same order as the
instructions were given.  This would probably shorten the time for a new
pathological set (distributed evenly across the disk surface, but all on
the worst-possible angular offset compared to the previous) to (8ms seek
time + 5ms rotational delay) * 4000 writes ~= 52 seconds (compared with
around 120 seconds for the previous set with rotational delay factored in).
Great, so you only need half as big a power store to guarantee writing that
much data, but it's still too much.  Even with a 15000rpm drive and 5ms
seek times, it would still be too much.

The OS needs to know the physical act of writing data has finished before
it tells the m/board to cut the power - period.  Pathological data sets
included - they are the worst case which every engineer must take into
account.  Out of interest, does Linux guarantee this, in the light of what
we've uncovered?  If so, perhaps it could use the same technique to fix
fdatasync() and family...

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jeremy Hansen


Ahh, now we're getting somewhere.

IDE:

jeremy:~# time ./xlog file.out fsync

real0m33.739s
user0m0.010s
sys 0m0.120s


so now this corresponds to the performance we're seeing on SCSI.

So I guess what I'm wondering now is can or should anything be done about
this on the SCSI side?

Thanks
-jeremy

On Tue, 6 Mar 2001, Mike Black wrote:

 Write caching is the culprit for the performance diff:

 On IDE:
 time xlog /blah.dat fsync
 0.000u 0.190s 0:01.72 11.0% 0+0k 0+0io 91pf+0w
 # hdparm -W 0 /dev/hda

 /dev/hda:
  setting drive write-caching to 0 (off)
 # time xlog /blah.dat fsync
 0.000u 0.220s 0:50.60 0.4%  0+0k 0+0io 91pf+0w
 # hdparm -W 1 /dev/hda

 /dev/hda:
  setting drive write-caching to 1 (on)
 # time xlog /blah.dat fsync
 0.010u 0.230s 0:01.88 12.7% 0+0k 0+0io 91pf+0w

 On my SCSI setup:
 # time xlog /usr5/blah.dat fsync
 0.020u 0.230s 0:30.48 0.8%  0+0k 0+0io 91pf+0w


 
 Michael D. Black   Principal Engineer
 [EMAIL PROTECTED]  321-676-2923,x203
 http://www.csihq.com  Computer Science Innovations
 http://www.csihq.com/~mike  My home page
 FAX 321-676-2355
 - Original Message -
 From: "Andre Hedrick" [EMAIL PROTECTED]
 To: "Linus Torvalds" [EMAIL PROTECTED]
 Cc: "Douglas Gilbert" [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Sent: Tuesday, March 06, 2001 2:12 AM
 Subject: Re: scsi vs ide performance on fsync's


 On Mon, 5 Mar 2001, Linus Torvalds wrote:

  Well, it's fairly hard for the kernel to do much about that - it's almost
  certainly just IDE doing write buffering on the disk itself. No OS
  involved.

 I am pushing for WC to be defaulted in the off state, but as you know I
 have a bigger fight than caching on my hands...

  I don't know if there is any way to turn of a write buffer on an IDE disk.

 You want a forced set of commands to kill caching at init?

 Andre Hedrick
 Linux ATA Development
 ASL Kernel Development
 
 -
 ASL, Inc. Toll free: 1-877-ASL-3535
 1757 Houret Court Fax: 1-408-941-2071
 Milpitas, CA 95035Web: www.aslab.com

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/


-- 
this is my sig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

On Tue, 6 Mar 2001, Mike Black wrote:

 Write caching is the culprit for the performance diff:

Indeed, and my during-the-boring-lecture benchmark on my 18Gb IBM
TravelStar bears this out.  I was confused earlier by the fact that one of
my Seagate drives blatently ignores the no-write-caching request I sent it.
:P


At 4:02 pm + 6/3/2001, Jeremy Hansen wrote:

Ahh, now we're getting somewhere.

so now this corresponds to the performance we're seeing on SCSI.

So I guess what I'm wondering now is can or should anything be done about
this on the SCSI side?

Maybe, it depends on your perspective.  In my personal opinion, the IDE
behaviour is incorrect and some way of dealing with it (while still
retaining the benefits of write-caching for normal applications) would be
highly desirable.  However, some applications may like or partially rely on
that behaviour, to gain better on-disk data consistency while not suffering
too much in performance (eg. the transaction database mentioned by at least
one poster).

The way to make all parties happy is to fix the IDE driver (or drives!) and
make sure an *alternative* syscall is available which flushes the buffers
asynchronously, as per the current IDE behaviour.  It shouldn't be too hard
to make the SCSI driver use that behaviour in the alternative syscall
(which may already exist, I don't know Linux well enough to say).

May this be a warning to all hardware manufacturers who "tweak" their
hardware to gain better benchmark results without actually increasing
performance - you *will* be found out!

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread David Balazic

(( please CC me , not subscribed , [EMAIL PROTECTED] )

Jonathan Morton ([EMAIL PROTECTED]) wrote :

 The OS needs to know the physical act of writing data has finished before 
 it tells the m/board to cut the power - period. Pathological data sets  
 included - they are the worst case which every engineer must take into  
 account. Out of interest, does Linux guarantee this, in the light of what
 we've uncovered? If so, perhaps it could use the same technique to fix
 fdatasync() and family...

Linux currently ignores write-cache, AFAICT.
Recently I asked a similar question , about flushing drive caches at shutdown :
Subject : "Flusing caches on shutdown"
message archived at :
http://boudicca.tux.org/hypermail/linux-kernel/2001week08/0157.html
Body attached at end of this message.

The answer ( and only reply ) was :
[ archived at : http://boudicca.tux.org/hypermail/linux-kernel/2001week08/0211.html ]
--- begin quote ---
From: Ingo Oeser ([EMAIL PROTECTED])

On Mon, Feb 19, 2001 at 01:45:57PM +0100, David Balazic wrote:
 It is a good idea IMO to flush the write cache of storage devices 
 at shutdown and other critical moments. 
 
Not needed. All device drivers should disable write caches of
their devices, that need another signal than switching it off by  
the power button to flush themselves.
 
 Loosing data at powerdown due to write caches have been reported, 
 so this is no a theoretical problems. Also the journaled filesystems 
 are safe only in theory if the journal is not stored on non-volatile 
 memory, which is not guarantied in the current kernel. 
 
Fine. If users/admins have write caching enabled, they either
know what they do, or should disable it (which is the default for
all mass storage drivers AFAIK).
 
Hardware Level caching is only good for OSes which have broken
drivers and broken caching (like plain old DOS).

Linux does a good job in caching and cache control at software
level.
 
Regards

Ingo Oeser
--- end quote ---

My original mail :
--- begin quote ---
   (( CC me the replies, as I'm not subscribed to LKML ))

   Hi! 
 
   It is a good idea IMO to flush the write cache of storage devices
   at shutdown and other critical moments.
   I browsed through linux-2.4.1 and see no use of the SYNCHRONIZE CACHE
   SCSI command ( curiously it is defined in several other files
   besides include/scsi/scsi.h , grep returns :
   drivers/scsi/pci2000.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   drivers/scsi/psi_dale.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   drivers/scsi/psi240i.h:#define SCSIOP_SYNCHRONIZE_CACHE 0x35
   ) 

   I couldn't find evidence to the use of the equivalent ATA command either
   ( FLUSH CACHE , command code E7h ).
   Also add ATAPI to the list. ( and all other interfaces. I checked just SCSI
   and ATA )

   Loosing data at powerdown due to write caches have been reported,
   so this is no a theoretical problems. Also the journaled filesystems
   are safe only in theory if the journal is not stored on non-volatile 
   memory, which is not guarantied in the current kernel.

   What is the official word on this issue ?
   I think this is important to the "enterprise" guys, at the least.
   
   Sincerely,
   david 
   
   PS: CC me , as I'm not subscribed to LKML
--- end quote ---

-- 
David Balazic
--
"Be excellent to each other." - Bill  Ted
- - - - - - - - - - - - - - - - - - - - - -

-- 
David Balazic
--
"Be excellent to each other." - Bill  Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Gregory Maxwell

On Tue, Mar 06, 2001 at 06:14:15PM +0100, David Balazic wrote:
[snip]
 Hardware Level caching is only good for OSes which have broken
 drivers and broken caching (like plain old DOS).
 
 Linux does a good job in caching and cache control at software
 level.

Read caching, yes. But for writes, the drive can often do a lot more
optimization because of it's synchronous operation with the platter and
greater knowledge of internal disk geometry.

What would be useful, as Alan said, is a barrier operation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

Jonathan Morton ([EMAIL PROTECTED]) wrote :

 The OS needs to know the physical act of writing data has finished
before
 it tells the m/board to cut the power - period. Pathological data sets
 included - they are the worst case which every engineer must take into
 account. Out of interest, does Linux guarantee this, in the light of what
 we've uncovered? If so, perhaps it could use the same technique to fix
 fdatasync() and family...

Linux currently ignores write-cache, AFAICT.
Recently I asked a similar question , about flushing drive caches at
shutdown :

On Mon, Feb 19, 2001 at 01:45:57PM +0100, David Balazic wrote:

 It is a good idea IMO to flush the write cache of storage devices
 at shutdown and other critical moments.

Not needed. All device drivers should disable write caches of
their devices, that need another signal than switching it off by
the power button to flush themselves.

Sounds like a sensible place to implement it - in the device driver.  I
also note the existence of an ATA flush-buffer command, which should
probably be used in sync() and family.  The call(s) to the sync() family on
shutdown should probably be performed by the filesystem itself on unmount
(or remount read-only), and if journalled filesystems need synchronisation
they should use sync() (or a more fine-grained version) themselves as
necessary.

Doesn't sound like too much of a headache to implement, to me - unless some
drives ignore the ATA FLUSH command, in which case said drives can be
considered seriously broken.  :P  I don't agree that write-caching in
itself is a bad thing, particularly given the amount of CPU overhead that
IDE drives demand while attached to the controller (orders of magnitude
higher than a good SCSI controller) - the more overhead we can hand off to
dedicated hardware, the better.  What does matter is that drives
implementing write-caching are handled in a safe and efficient manner,
especially in cases where they refuse to turn such caching off (eg. my
Seagate Barracuda *glares at drive*).

Recalling my recent comments on worst-case drive-shutdown timings, I also
remember seeing drives with 18ms *average* seek times quite recently - this
was a Quantum Bigfoot (yes, a 5.25" HD), found in a low-end Compaq desktop
- if anyone still believes Compaq makes high-quality machines for their
low-end market, they're totally mistaken.  The machine sped up quite a lot
when a new 3.5" IBM DeskStar was installed, with an 8.5ms average seek and
an almost doubling in rotational speed.  :)

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Linus Torvalds



On Tue, 6 Mar 2001, Alan Cox wrote:

   I don't know if there is any way to turn of a write buffer on an IDE disk.
  You want a forced set of commands to kill caching at init?
 
 Wrong model
 
 You want a write barrier. Write buffering (at least for short intervals) in
 the drive is very sensible. The kernel needs to able to send drivers a write
 barrier which will not be completed with outstanding commands before the
 barrier.

Agreed.

Write buffering is incredibly useful on a disk - for all the same reasons
that an OS wants to do it. The disk can use write buffering to speed up
writes a lot - not just lower the _perceived_ latency by the OS, but to
actually improve performance too.

But Alan is right - we needs a "sync" command or something. I don't know
if IDE has one (it already might, for all I know).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread David Balazic

Linus Torvalds himself wrote :

 On Tue, 6 Mar 2001, Alan Cox wrote: 
  
I don't know if there is any way to turn of a write buffer on an IDE disk. 
   You want a forced set of commands to kill caching at init? 
  
  Wrong model 
  
  You want a write barrier. Write buffering (at least for short intervals) in 
  the drive is very sensible. The kernel needs to able to send drivers a write 
  barrier which will not be completed with outstanding commands before the 
  barrier. 
 
 Agreed. 
 
 Write buffering is incredibly useful on a disk - for all the same reasons 
 that an OS wants to do it. The disk can use write buffering to speed up 
 writes a lot - not just lower the _perceived_ latency by the OS, but to 
 actually improve performance too. 
 
 But Alan is right - we needs a "sync" command or something. I don't know 
 if IDE has one (it already might, for all I know). 

ATA , SCSI and ATAPI all have a FLUSH_CACHE command. (*)
Whether the drives implement it is another question ...

(*) references : 
  ATA-6 draft standard from www.t13.org
  MtFuji document from 


-- 
David Balazic
--
"Be excellent to each other." - Bill  Ted
- - - - - - - - - - - - - - - - - - - - - -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Andre Hedrick


Jonathan,

I am not going to bite on your flame bate, and are free to waste you money.

On Tue, 6 Mar 2001, Jonathan Morton wrote:

  It's pretty clear that the IDE drive(r) is *not* waiting for the physical
  write to take place before returning control to the user program, whereas
  the SCSI drive(r) is.  Both devices appear to be performing the write
 
 Wrong, IDE does not unplug thus the request is almost, I hate to admit it
 SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
 the driver lets go.
 
 Uh, run that past me again?  You are saying that because the IDE drive hogs
 the bus until the write is complete or the driver forcibly disconnects, you
 make the driver disconnect to save time?  Or (more likely) have I totally
 misread you...

No, SCSI does with queuing.
I am saying that the ata/ide driver rips the heart out of the
io_request_lock what to darn long.  This means that upon execution a
request virtually all interrupts are wacked and the drivers in dominating
the system.  Given that IO's are limited to 128 sectors or one DMA PRD,
this is vastly smaller than the SCSI trasfer limit.

Since you are not using the test "Write Verify Read" all drives are going
to lie.  Only this command will force the stuff to hit the platters and
return a read out of the dirty-cache.

 pre-seek.  Thus the question is were is the drive leaving the heads when
 not active?  It does not appear to be in the zone 1 region.
 
 Duh...  I don't quite see what you're saying here, either.  The test is a

Okay real shortlimit to two zones that are equal in size.
The inner and outer, and the latter will cover more physical media than
the former.  Simple Two zone model.

 continuous rewrite of the same sector of the disk, so the head shouldn't be
 moving *at all* until it's all over.  In addition, the drive can't start

True and you slip a rev. everytime.

 writing the sector when it's just finished writing it, so it has to wait
 for the rotation to breing it back round again.  Under those circumstances,
 I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
 *regardless* of seeking performance.  Seeking doesn't come into it!

It does, because more RPM means more air-flow and more work to keep the
position stable.

 Thus if your drive is one of those that does a stress test check that goes:
 "this bozo did not really mean to turn off write caching, renabling smurk"
 
 Why does this sound familiar?

Because of WinBench!
All the prefetch/caching are modeled to be optimized to that bench-mark.

 Personally, I feel the bottom line is rapidly turning into "if you have
 critical data, don't put it on an IDE disk".  There are too many corners
 cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
 if you like - I realise SCSI is more expensive, but you get what you pay
 for.

Let me slap you in the face with a salomi stick!
ATA 7200 RPM Drives are using SCSI 7200 RPM Drive HDA's
So you say ATA is Lame?  Then so was your SCSI 7200's.

 Of course, under normal circumstances, you leave write-caching and UDMA on,
 and you don't use a pathological stress-test like we've been doing.  That
 gives the best performance.  But sometimes it's necessary to use these
 "pathological" access patterns to achieve certain system functions.
 Suppose, harking back to the Windows data-corruption scenario mentioned
 earlier, that just before powering off you stuffed several MB of data,
 scattered across the disk, into said disk and waited for said disk to say
 "yup, i've got that", then powered down.  Recent drives have very large
 (2MB?) on-board caches, so how long does it take for a pathological pattern
 of these to be committed to physical media?  Can the drive sustain it's own
 power long enough to do this (highly unlikely)?  So the drive *must* be
 able to tell the OS when it's actually committed the data to media, or risk
 *serious* data corruption.

OH...you are talking about the one IBM drive that is goat-screwed...
The one that is to stupid to use the energy of the platters to drop the
data in the vender power down strip...yet it dumps the buffer in a panic..

ERM, that is a bad drive, regardless if they publish an errata that states
only good HOSTS that issue a flush-cache prior to power are to be
certified...we maybe if they did not default the WC on then it would be a
NOP of the design error.  Since all OSes that enable WC at init will flush
it at shutdown and do a periodic purge with in-activity.

 Pathological shutdown pattern:  assuming scatter-gather is not allowed (for
 IDE), and a 20ms full-stroke seek, write sectors at alternately opposite
 ends of the disk, working inwards until the buffer is full.  512-byte
 sectors, 2MB of them, is 4000 writes * 20ms = around 80 seconds (not
 including rotational delay, either).  Last time I checked, you'd need a
 capacitor array the size of the entire computer case to store enough power
 to allow the drive to do this after system 

Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jens Axboe

On Tue, Mar 06 2001, David Balazic wrote:
   Wrong model 
   
   You want a write barrier. Write buffering (at least for short intervals)
   in the drive is very sensible. The kernel needs to able to send
   drivers a write barrier which will not be completed with outstanding
   commands before the 
   barrier. 
  
  Agreed. 
  
  Write buffering is incredibly useful on a disk - for all the same reasons 
  that an OS wants to do it. The disk can use write buffering to speed up 
  writes a lot - not just lower the _perceived_ latency by the OS, but to 
  actually improve performance too. 
  
  But Alan is right - we needs a "sync" command or something. I don't know 
  if IDE has one (it already might, for all I know). 
 
 ATA , SCSI and ATAPI all have a FLUSH_CACHE command. (*)
 Whether the drives implement it is another question ...

(Usually called SYNCHRONIZE_CACHE btw)

SCSI has ordered tag, which fit the model Alan described quite nicely.
I've been meaning to implement this for some time, it would be handy
for journalled fs to use such a barrier. Since ATA doesn't do queueing
(at least not in current Linux), a synchronize cache is probably the
only way to go there.

 (*) references : 
   ATA-6 draft standard from www.t13.org
   MtFuji document from 

ftp.avc-pioneer.com

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Mark Hahn

 itself is a bad thing, particularly given the amount of CPU overhead that
 IDE drives demand while attached to the controller (orders of magnitude
 higher than a good SCSI controller) - the more overhead we can hand off to

I know this is just a troll by a scsi-believer, but I'm biting anyway.

on current machines and disks, ide costs a few % CPU, depending on 
which CPU, disk, kernel, the sustained bandwidth, etc.  I've measured
this using the now-trendy method of noticing how much the IO costs
a separate, CPU-bound benchmark: load = 1 - (unloadedPerf / loadedPerf).
my cheesy duron/600 desktop typically shows ~2% actual cost when running
bonnie's block IO tests.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Jonathan Morton

I am not going to bite on your flame bate, and are free to waste you money.

I don't flamebait.  I was trying to clear up some confusion...

No, SCSI does with queuing.
I am saying that the ata/ide driver rips the heart out of the
io_request_lock what to darn long.  This means that upon execution a
request virtually all interrupts are wacked and the drivers in dominating
the system.  Given that IO's are limited to 128 sectors or one DMA PRD,
this is vastly smaller than the SCSI trasfer limit.

Ah, so the ATA driver hogs interrupts.  Nice.  Kinda explains why I can't
use the mouse on some systems when I use cdparanoia.

Okay real shortlimit to two zones that are equal in size.
The inner and outer, and the latter will cover more physical media than
the former.  Simple Two zone model.

Still doesn't make a difference - there is one revolution between writes,
no matter where on disk it is.

 Under those circumstances,
 I would expect my 7200rpm Seagate to perform slower than my 1rpm IBM
 *regardless* of seeking performance.  Seeking doesn't come into it!

It does, because more RPM means more air-flow and more work to keep the
position stable.

That's the engineers' problem, not ours.  In fact, it's not really a
problem because my IBM drive gave almost exactly the correct performance
result, even at 1rpm, therefore it's managing to keep the position
stable regardless of airflow.

 Why does this sound familiar?

Because of WinBench!
All the prefetch/caching are modeled to be optimized to that bench-mark.

Lies, damn lies, statistics, benchmarks, delivery dates.  Especially a
consumer-oriented benchmark like WinBench.  It's perfectly natural to
optimise for particular access patterns, but IMHO that doesn't excuse
breaking the drive just to get a better benchmark score.

 Personally, I feel the bottom line is rapidly turning into "if you have
 critical data, don't put it on an IDE disk".  There are too many corners
 cut when compared to ostensibly similar SCSI devices.  Call me a SCSI bigot
 if you like - I realise SCSI is more expensive, but you get what you pay
 for.

Let me slap you in the face with a salomi stick!
ATA 7200 RPM Drives are using SCSI 7200 RPM Drive HDA's
So you say ATA is Lame?  Then so was your SCSI 7200's.

That isn't the point!  I'm not talking about the physical mechanism, which
indeed is often the same between one generation of SCSI and the next
generation of IDE devices.  I'm talking about the IDE controller which is
slapped on the bottom of said mechanism.  The mech can be of world-class
quality, but if the controller is shot it doesn't cut the grain.

Since all OSes that enable WC at init will flush
it at shutdown and do a periodic purge with in-activity.

But Linux doesn't, as has been pointed out earlier.  We need to fix Linux.
Also, as I and someone else have also pointed out, there are drives in
circulation which refuse to turn off write caching, including one sitting
in my main workstation - the one which is rebooted the most often, simply
because I need to use Windoze 95 for a few onerous tasks.  I haven't
suffered disk corruption yet, because Linux unmounts the filesystems and
flushes it's own buffers several seconds before powering down, and uses a
non-pathological access pattern, but I sure don't want to see the first
time this doesn't work properly.

Err, last time I check all good devices flush their write caching on their
own to take advantage of having a maximum cache for prefetching.

Which doesn't work if the buffer is filled up by the OS 0.5 seconds before
the power goes.

I'm sorry if this looks like another troll, but I really do like to clear
up confusion.  I do accept that IDE now has good enough real performance
for many purposes, but in terms of enforced quality it clearly lags behind
the entire SCSI field.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-06 Thread Andre Hedrick

On Wed, 7 Mar 2001, Jonathan Morton wrote:

 Still doesn't make a difference - there is one revolution between writes,
 no matter where on disk it is.

Oh it does, because you are hitting the same sector with the same data.
Rotate your buffer and then you will see the difference.

 Because of WinBench!
 All the prefetch/caching are modeled to be optimized to that bench-mark.
 
 Lies, damn lies, statistics, benchmarks, delivery dates.  Especially a
 consumer-oriented benchmark like WinBench.  It's perfectly natural to
 optimise for particular access patterns, but IMHO that doesn't excuse
 breaking the drive just to get a better benchmark score.

Obviously you have never been in the bowls of drive industry hell.
Why do you think there was a change the ATA-6 to require the
Write-Verify-Read to always return stuff from the platter?
Because the SOB's in storage LIE!  A real wake-up call for you is that
everything about the world of storage is a big-fat-whopper of a LIE.

Storage devices are BLACK-BOXES with the standards/rules to communicate
being dictated by the device not the host.  Storage devices are no beter
then a Coke(tm) vending machine.  You push "Coke" it gives you "Coke".
You have not a clue to how it arrives or where it came from.
Same thing about reading from a drive.

 That isn't the point!  I'm not talking about the physical mechanism, which
 indeed is often the same between one generation of SCSI and the next
 generation of IDE devices.  I'm talking about the IDE controller which is
 slapped on the bottom of said mechanism.  The mech can be of world-class
 quality, but if the controller is shot it doesn't cut the grain.

So there is a $5 differnce in the cell-gates and the line drivers are more
powerful,  80GB ATA + $5 != 80GB SCSI.

 Since all OSes that enable WC at init will flush
 it at shutdown and do a periodic purge with in-activity.
 
 But Linux doesn't, as has been pointed out earlier.  We need to fix Linux.

Friend I have fixed this some time ago but it is bundled with TASKFILE
that is not going to arrive until 2.5.  Because I need a way to execute
this and hold the driver until it is complete, regardless of the shutdown
method.

 Err, last time I check all good devices flush their write caching on their
 own to take advantage of having a maximum cache for prefetching.
 
 Which doesn't work if the buffer is filled up by the OS 0.5 seconds before
 the power goes.

Maybe that is why there is a vender disk-cache dump zone on the edge of
the platters...just maybe you need to buy your drives from somebody that
does this and has a predictive sector stretcher as the energy from the
inertia by the DC three-phase motor executes the dump.

Ever wondered why modern drives have open collectors on the databuss?
Maybe to disconnect the power draw so that the motor now generator
provides the needed power to complete the data dump...

 I'm sorry if this looks like another troll, but I really do like to clear
 up confusion.  I do accept that IDE now has good enough real performance
 for many purposes, but in terms of enforced quality it clearly lags behind
 the entire SCSI field.

I have no desire to debate the merits, but when your onboard host for ATA
starts shipping with GigaBit-Copper speeds then we can have a pissing
contest.

Cheers,

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andre Hedrick

On Mon, 5 Mar 2001, Linus Torvalds wrote:

> Well, it's fairly hard for the kernel to do much about that - it's almost
> certainly just IDE doing write buffering on the disk itself. No OS
> involved.

I am pushing for WC to be defaulted in the off state, but as you know I
have a bigger fight than caching on my hands...

> I don't know if there is any way to turn of a write buffer on an IDE disk.

You want a forced set of commands to kill caching at init?

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andre Hedrick

On Tue, 6 Mar 2001, Jonathan Morton wrote:

> It's pretty clear that the IDE drive(r) is *not* waiting for the physical
> write to take place before returning control to the user program, whereas
> the SCSI drive(r) is.  Both devices appear to be performing the write

Wrong, IDE does not unplug thus the request is almost, I hate to admit it
SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
the driver lets go.

> immediately, however (judging from the device activity lights).  Whether
> this is the correct behaviour or not, I leave up to you kernel hackers...

Seagate has a better seek profile than ibm.
The second access is correct because the first one pushed the heads to the
pre-seek.  Thus the question is were is the drive leaving the heads when
not active?  It does not appear to be in the zone 1 region.

> IMHO, if an application needs performance, it shouldn't be syncing disks
> after every write.  Syncing means, in my book, "wait for the data to be
> committed to physical media" - note the *wait* involved there - so syncing
> should only be used where data integrity in the event of a system failure
> has a much higher importance than performance.

I have only gotten the drive makers in the past 6 months to committee to
actively updating the contents of the identify page to reflect reality.
Thus if your drive is one of those that does a stress test check that goes:
"this bozo did not really mean to turn off write caching, renabling "

Cheers,

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Jonathan Morton

>I don't know if there is any way to turn of a write buffer on an IDE disk.

hdparm has an option of this nature, but it makes no difference (as I
reported).  It's worth noting that even turning off UDMA to the disk on my
machine doesn't help the situation - although it does slow things down a
little, it's not "slow enough" to indicate that the drive is behaving
properly.  Might be worth running the test on some of my other machines,
with their diverse collection of IDE controllers (mostly non-UDMA) and
disks.

>Of course, whether you should even trust the harddisk is another question.

I think this result in itself would lead me *not* to trust the hard disk,
especially an IDE one.  Has anybody tried running this test with a recent
IBM DeskStar - one of the ones that is the same mech as the equivalent
UltraStar but with IDE controller?  I only have SCSI and laptop IBMs here -
all my desktop IDE drives are Seagate.  However I do have one SCSI Seagate,
which might be worth firing up for the occasion...

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Linus Torvalds



On Tue, 6 Mar 2001, Douglas Gilbert wrote:
> 
> > On the other hand, it's also entirely possible that IDE is just a lot
> > better than what the SCSI-bigots tend to claim. It's not all that
> > surprising, considering that the PC industry has pushed untold billions of
> > dollars into improving IDE, with SCSI as nary a consideration. The above
> > may just simply be the Truth, with a capital T.
> 
> What exactly do you think fsync() and fdatasync() should
> do? If they need to wait for dirty buffers to get flushed
> to the disk oxide then multiple reported IDE results to
> this thread are defying physics.

Well, it's fairly hard for the kernel to do much about that - it's almost
certainly just IDE doing write buffering on the disk itself. No OS
involved.

The kernel VFS and controller layers certainly wait for the disk to tell
us that the data has been written, there's no question about that. But
it's also not at all unlikely that the disk itself just lies.

I don't know if there is any way to turn of a write buffer on an IDE disk.

I do remember that there were some reports of filesystem corruption with
some version of Windows that turned off the machine at shutdown (using
software power-off as supported by most modern motherboards), and shut
down so fast that the drives had not actually written out all data.
Whether the reports were true or not I do not know, but I think we can
take for granted that write buffers exist.

Now, if you really care about your data integrity with a write-buffering
disk, I suspect that you'd better have an UPS. At which point write
buffering is a valid optimization, as long as you trust the harddisk
itself not to crash even if the OS were to crash.

Of course, whether you should even trust the harddisk is another question.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert


Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Linus Torvalds wrote:

> Well, it's entirely possible that the mid-level SCSI layer is doing
> something horribly stupid.

Well it's in good company as FreeBSD 4.2 on the same hardware
returns the same result (including IDE timings that were too
fast). My timepeg analysis showed that the SCSI disk was consuming
the time, not any of the SCSI layers.

> On the other hand, it's also entirely possible that IDE is just a lot
> better than what the SCSI-bigots tend to claim. It's not all that
> surprising, considering that the PC industry has pushed untold billions of
> dollars into improving IDE, with SCSI as nary a consideration. The above
> may just simply be the Truth, with a capital T.

What exactly do you think fsync() and fdatasync() should
do? If they need to wait for dirty buffers to get flushed
to the disk oxide then multiple reported IDE results to
this thread are defying physics.


Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Linus Torvalds



On Tue, 6 Mar 2001, Jonathan Morton wrote:
> 
> It's pretty clear that the IDE drive(r) is *not* waiting for the physical
> write to take place before returning control to the user program, whereas
> the SCSI drive(r) is.

This would not be unexpected.

IDE drives generally always do write buffering. I don't even know if you
_can_ turn it off. So the drive claims to have written the data as soon as
it has made the write buffer.

It's definitely not the driver, but the actual drive.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Jonathan Morton

I've run the test on my own system and noted something interesting about
the results:

When the write() call extended the file (rather than just overwriting a
section of a file already long enough), the performance drop was seen, and
it was slower on SCSI than IDE - this is independent of whether IDE had
hardware write-caching on or off.  Where the file already existed, from an
immediately-prior run of the same benchmark, both SCSI and IDE sped up to
the same, relatively fast speed.

These runs are for the following code, writing 2000 blocks of 4096 bytes each:

fd = open("tst.txt", O_WRONLY | O_CREAT, 0644);
for (k = 0; k < NUM_BLKS; ++k) {
write(fd, buff + (k * BLK_SIZE), BLK_SIZE);
fdatasync(fd);
}
close(fd);

IDE: Seagate Barracuda 7200rpm UDMA/66
first run:  1.98 elapsed
second and further runs:0.50 elapsed

SCSI: IBM UltraStar 1 rpm Ultra/160
first run:  23.57 elapsed
second and further runs:0.55 elapsed

If the test file is removed between runs, all show the longer timings.

HOWEVER if I modify the benchmark to use 2000 blocks of *20* bytes each,
the timings change.

IDE: Seagate Barracuda 7200rpm UDMA/66
first run:  1.46 elapsed
second and further runs:1.45 elapsed

SCSI: IBM UltraStar 1 rpm Ultra/160
first run:  18.30 elapsed
second and further runs:11.88 elapsed

Notice that the time for the second run of the SCSI drive is almost exactly
one-fifth of a minute, and remember that 2000 rotations / 1 rpm = 1/5
minute.  IOW, the SCSI drive is performing *correctly* on the second run of
the benchmark.  The poorer performance on the first run *could* be
attributed to writing metadata interleaved with the data writes.  The
better performance on the second run of the first benchmark can easily be
attributed to the fact that the drive does not need to wait an entire
revolution before writing the next block of a file, if that block arrives
quickly enough (this is a Duron, so it darn well arrives quickly).

It's pretty clear that the IDE drive(r) is *not* waiting for the physical
write to take place before returning control to the user program, whereas
the SCSI drive(r) is.  Both devices appear to be performing the write
immediately, however (judging from the device activity lights).  Whether
this is the correct behaviour or not, I leave up to you kernel hackers...

IMHO, if an application needs performance, it shouldn't be syncing disks
after every write.  Syncing means, in my book, "wait for the data to be
committed to physical media" - note the *wait* involved there - so syncing
should only be used where data integrity in the event of a system failure
has a much higher importance than performance.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Linus Torvalds



On Mon, 5 Mar 2001, Jeremy Hansen wrote:
> 
> Right now I'm running 2.4.2-ac11 on both machines and getting the same
> results:
> 
> SCSI:
> 
> [root@orville /root]# time /root/xlog file.out fsync
> 
> real0m21.266s
> user0m0.000s
> sys 0m0.310s
> 
> IDE:
> 
> [root@kahlbi /root]# time /root/xlog file.out fsync
> 
> real0m8.928s
> user0m0.000s
> sys 0m6.700s
> 
> This behavior has been noticed by others, so I'm hoping I'm not just crazy
> or that my test is somehow flawed.
> 
> We're using MySQL with Berkeley DB for transaction log support.  It was
> really confusing when a simple ide workstation was out performing our
> Ultra160 raid array.

Well, it's entirely possible that the mid-level SCSI layer is doing
something horribly stupid.

On the other hand, it's also entirely possible that IDE is just a lot
better than what the SCSI-bigots tend to claim. It's not all that
surprising, considering that the PC industry has pushed untold billions of
dollars into improving IDE, with SCSI as nary a consideration. The above
may just simply be the Truth, with a capital T.

(And "bonnie" is not a very good benchmark. It's not exactly mirroring any
real life access patterns. I would not be surprised if the SCSI driver
performance has been tuned by bonnie alone, and maybe it just sucks at
everything else)

Maybe we should ask whether somebody like lnz is interested in seeing what
SCSI does wrong here?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Jeremy Hansen

On 2 Mar 2001, Linus Torvalds wrote:

> In article <[EMAIL PROTECTED]>,
> Jeremy Hansen  <[EMAIL PROTECTED]> wrote:
> >
> >The SCSI adapter on the raid array is an Adaptec 39160, the raid
> >controller is a CMD-7040.  Kernel 2.4.0 using XFS for the filesystem on
> >the raid array, kernel 2.2.18 on ext2 on the IDE drive.  The filesystem is
> >not the problem, as I get almost the exact same results running this on
> >ext2 on the raid array.
>
> Did you try a 2.4.x kernel on both?

Finally got around to working on this.

Right now I'm running 2.4.2-ac11 on both machines and getting the same
results:

SCSI:

[root@orville /root]# time /root/xlog file.out fsync

real0m21.266s
user0m0.000s
sys 0m0.310s

IDE:

[root@kahlbi /root]# time /root/xlog file.out fsync

real0m8.928s
user0m0.000s
sys 0m6.700s

This behavior has been noticed by others, so I'm hoping I'm not just crazy
or that my test is somehow flawed.

We're using MySQL with Berkeley DB for transaction log support.  It was
really confusing when a simple ide workstation was out performing our
Ultra160 raid array.

Thanks
-jeremy

> 2.4.0 has a bad elevator, which may show problems, so please check 2.4.2
> if the numbers change. Also, "fsync()" is very different indeed on 2.2.x
> and 2.4.x, and I would not be 100% surprised if your IDE drive does
> asynchronous write caching and your RAID does not... That would not show
> up in bonnie.
>
> Also note how your bonnie file remove numbers for IDE seem to be much
> better than for your RAID array, so it is not impossible that your RAID
> unit just has a _huge_ setup overhead but good throughput, and that the
> IDE numbers are better simply because your IDE setup is much lower
> latency. Never mistake throughput for _speed_.
>
>   Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

-- 
this is my sig.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert

Since the intention of fsync and fdatasync seems to be
to write dirty fs buffers to persistent storage (i.e.
the "oxide") then the best time is not necessarily
the objective. Given the IDE times that people have 
been reporting, it is very unlikely that any of those
IDE disks were really doing 2000 discrete IO operations
involving waiting for the those buffers to be written
to the "oxide". [Reason: it should take at least 2000 
revolutions of the disk to do it, since most of the
4KB writes are going to the same disk address as the
prior write.]

As it stands, the Linux SCSI subsystem has no mechanism 
to force a disk cache write through. The SCSI WRITE(10)
command has a Force Unit Access bit (FUA) to do exactly
that, but we don't use it. Do the fs/block layers flag
they wish buffers written to the oxide?? 
The measurements that showed SCSI disks were taking a lot 
longer with the "xlog" test were more luck than good 
management.

Here are some tests that show an IDE versus SCSI "xlog"
comparison are very similar between FreeBSD 4.2 and
lk 2.4.2 on the same hardware: 

# IBM DCHS04U SCSI disk 7200 rpm  <>
[root@free /var]# time /root/xlog tst.txt
real0m0.043s
[root@free /var]# time /root/xlog tst.txt fsync
real0m33.131s

# Quantum Fireball ST3.2A IDE disk 3600 rpm  <>
[root@free dos]# time /root/xlog tst.txt
real0m0.034s
[root@free dos]# time /root/xlog tst.txt fsync
real0m5.737s


# IBM DCHS04U SCSI disk 7200 rpm  <>
[root@tvilling extra]# time /root/xlog tst.txt
0:00.00elapsed 125%CPU
[root@tvilling spare]# time /root/xlog tst.txt fsync
0:33.15elapsed 0%CPU

# Quantum Fireball ST3.2A IDE disk 3600 rpm  <>
[root@tvilling /root]# time /root/xlog tst.txt
0:00.02elapsed 43%CPU
[root@tvilling /root]# time /root/xlog tst.txt fsync
0:05.99elapsed 69%CPU


Notes: FreeBSD doesn't have fdatasync() so I changed xlog 
to use fsync(). Linux timings were the same with fsync() 
and fdatasync(). The xlog program crashed immediately in
FreeBSD; it needed some sanity checks on its arguments.

One further note: I wrote:
> [snip] 
> So writing more data to the SCSI disk speeds it up!
> I suspect the critical point in the "20*200" test is
> that the same sequence of 8 512 byte sectors are being
> written to disk 200 times. BTW That disk spins at
> 15K rpm so one rotation takes 4 ms and it has a
> 4 MB cache.

A clarification: by "same sequence" I meant written
to the same disk address. If the 4 KB lies on the same
track, then a delay of one disk revolution would be
expected before you could write the next 4 KB to the 
"oxide" at the same address.

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andi Kleen

Chris Mason <[EMAIL PROTECTED]> writes:


> filemap_fdatawait, filemap_fdatasync, and fsync_inode_buffers all restrict
> their scans to a list of dirty buffers for that specific file.  Only
> file_fsync goes through all the dirty buffers on the device, and the ext2
> fsync path never calls file_fsync.
> 
> Or am I missing something?

If the filesystems tested had blocksize < PAGE_SIZE the fsync would try
to sync everything, not walk the dirty buffers directly.
So e.g. if one of the file systems tested was generated with old ext2 utils
that do not use 4K block size then some performance difference could be 
explained.


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andi Kleen

Chris Mason [EMAIL PROTECTED] writes:


 filemap_fdatawait, filemap_fdatasync, and fsync_inode_buffers all restrict
 their scans to a list of dirty buffers for that specific file.  Only
 file_fsync goes through all the dirty buffers on the device, and the ext2
 fsync path never calls file_fsync.
 
 Or am I missing something?

If the filesystems tested had blocksize  PAGE_SIZE the fsync would try
to sync everything, not walk the dirty buffers directly.
So e.g. if one of the file systems tested was generated with old ext2 utils
that do not use 4K block size then some performance difference could be 
explained.


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Jonathan Morton

I've run the test on my own system and noted something interesting about
the results:

When the write() call extended the file (rather than just overwriting a
section of a file already long enough), the performance drop was seen, and
it was slower on SCSI than IDE - this is independent of whether IDE had
hardware write-caching on or off.  Where the file already existed, from an
immediately-prior run of the same benchmark, both SCSI and IDE sped up to
the same, relatively fast speed.

These runs are for the following code, writing 2000 blocks of 4096 bytes each:

fd = open("tst.txt", O_WRONLY | O_CREAT, 0644);
for (k = 0; k  NUM_BLKS; ++k) {
write(fd, buff + (k * BLK_SIZE), BLK_SIZE);
fdatasync(fd);
}
close(fd);

IDE: Seagate Barracuda 7200rpm UDMA/66
first run:  1.98 elapsed
second and further runs:0.50 elapsed

SCSI: IBM UltraStar 1 rpm Ultra/160
first run:  23.57 elapsed
second and further runs:0.55 elapsed

If the test file is removed between runs, all show the longer timings.

HOWEVER if I modify the benchmark to use 2000 blocks of *20* bytes each,
the timings change.

IDE: Seagate Barracuda 7200rpm UDMA/66
first run:  1.46 elapsed
second and further runs:1.45 elapsed

SCSI: IBM UltraStar 1 rpm Ultra/160
first run:  18.30 elapsed
second and further runs:11.88 elapsed

Notice that the time for the second run of the SCSI drive is almost exactly
one-fifth of a minute, and remember that 2000 rotations / 1 rpm = 1/5
minute.  IOW, the SCSI drive is performing *correctly* on the second run of
the benchmark.  The poorer performance on the first run *could* be
attributed to writing metadata interleaved with the data writes.  The
better performance on the second run of the first benchmark can easily be
attributed to the fact that the drive does not need to wait an entire
revolution before writing the next block of a file, if that block arrives
quickly enough (this is a Duron, so it darn well arrives quickly).

It's pretty clear that the IDE drive(r) is *not* waiting for the physical
write to take place before returning control to the user program, whereas
the SCSI drive(r) is.  Both devices appear to be performing the write
immediately, however (judging from the device activity lights).  Whether
this is the correct behaviour or not, I leave up to you kernel hackers...

IMHO, if an application needs performance, it shouldn't be syncing disks
after every write.  Syncing means, in my book, "wait for the data to be
committed to physical media" - note the *wait* involved there - so syncing
should only be used where data integrity in the event of a system failure
has a much higher importance than performance.

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert

Since the intention of fsync and fdatasync seems to be
to write dirty fs buffers to persistent storage (i.e.
the "oxide") then the best time is not necessarily
the objective. Given the IDE times that people have 
been reporting, it is very unlikely that any of those
IDE disks were really doing 2000 discrete IO operations
involving waiting for the those buffers to be written
to the "oxide". [Reason: it should take at least 2000 
revolutions of the disk to do it, since most of the
4KB writes are going to the same disk address as the
prior write.]

As it stands, the Linux SCSI subsystem has no mechanism 
to force a disk cache write through. The SCSI WRITE(10)
command has a Force Unit Access bit (FUA) to do exactly
that, but we don't use it. Do the fs/block layers flag
they wish buffers written to the oxide?? 
The measurements that showed SCSI disks were taking a lot 
longer with the "xlog" test were more luck than good 
management.

Here are some tests that show an IDE versus SCSI "xlog"
comparison are very similar between FreeBSD 4.2 and
lk 2.4.2 on the same hardware: 

# IBM DCHS04U SCSI disk 7200 rpm  FreeBSD 4.2
[root@free /var]# time /root/xlog tst.txt
real0m0.043s
[root@free /var]# time /root/xlog tst.txt fsync
real0m33.131s

# Quantum Fireball ST3.2A IDE disk 3600 rpm  FreeBSD 4.2
[root@free dos]# time /root/xlog tst.txt
real0m0.034s
[root@free dos]# time /root/xlog tst.txt fsync
real0m5.737s


# IBM DCHS04U SCSI disk 7200 rpm  lk 2.4.2
[root@tvilling extra]# time /root/xlog tst.txt
0:00.00elapsed 125%CPU
[root@tvilling spare]# time /root/xlog tst.txt fsync
0:33.15elapsed 0%CPU

# Quantum Fireball ST3.2A IDE disk 3600 rpm  lk 2.4.2
[root@tvilling /root]# time /root/xlog tst.txt
0:00.02elapsed 43%CPU
[root@tvilling /root]# time /root/xlog tst.txt fsync
0:05.99elapsed 69%CPU


Notes: FreeBSD doesn't have fdatasync() so I changed xlog 
to use fsync(). Linux timings were the same with fsync() 
and fdatasync(). The xlog program crashed immediately in
FreeBSD; it needed some sanity checks on its arguments.

One further note: I wrote:
 [snip] 
 So writing more data to the SCSI disk speeds it up!
 I suspect the critical point in the "20*200" test is
 that the same sequence of 8 512 byte sectors are being
 written to disk 200 times. BTW That disk spins at
 15K rpm so one rotation takes 4 ms and it has a
 4 MB cache.

A clarification: by "same sequence" I meant written
to the same disk address. If the 4 KB lies on the same
track, then a delay of one disk revolution would be
expected before you could write the next 4 KB to the 
"oxide" at the same address.

Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Linus Torvalds



On Tue, 6 Mar 2001, Jonathan Morton wrote:
 
 It's pretty clear that the IDE drive(r) is *not* waiting for the physical
 write to take place before returning control to the user program, whereas
 the SCSI drive(r) is.

This would not be unexpected.

IDE drives generally always do write buffering. I don't even know if you
_can_ turn it off. So the drive claims to have written the data as soon as
it has made the write buffer.

It's definitely not the driver, but the actual drive.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert


Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Linus Torvalds wrote:

 Well, it's entirely possible that the mid-level SCSI layer is doing
 something horribly stupid.

Well it's in good company as FreeBSD 4.2 on the same hardware
returns the same result (including IDE timings that were too
fast). My timepeg analysis showed that the SCSI disk was consuming
the time, not any of the SCSI layers.

 On the other hand, it's also entirely possible that IDE is just a lot
 better than what the SCSI-bigots tend to claim. It's not all that
 surprising, considering that the PC industry has pushed untold billions of
 dollars into improving IDE, with SCSI as nary a consideration. The above
 may just simply be the Truth, with a capital T.

What exactly do you think fsync() and fdatasync() should
do? If they need to wait for dirty buffers to get flushed
to the disk oxide then multiple reported IDE results to
this thread are defying physics.


Doug Gilbert
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Linus Torvalds



On Tue, 6 Mar 2001, Douglas Gilbert wrote:
 
  On the other hand, it's also entirely possible that IDE is just a lot
  better than what the SCSI-bigots tend to claim. It's not all that
  surprising, considering that the PC industry has pushed untold billions of
  dollars into improving IDE, with SCSI as nary a consideration. The above
  may just simply be the Truth, with a capital T.
 
 What exactly do you think fsync() and fdatasync() should
 do? If they need to wait for dirty buffers to get flushed
 to the disk oxide then multiple reported IDE results to
 this thread are defying physics.

Well, it's fairly hard for the kernel to do much about that - it's almost
certainly just IDE doing write buffering on the disk itself. No OS
involved.

The kernel VFS and controller layers certainly wait for the disk to tell
us that the data has been written, there's no question about that. But
it's also not at all unlikely that the disk itself just lies.

I don't know if there is any way to turn of a write buffer on an IDE disk.

I do remember that there were some reports of filesystem corruption with
some version of Windows that turned off the machine at shutdown (using
software power-off as supported by most modern motherboards), and shut
down so fast that the drives had not actually written out all data.
Whether the reports were true or not I do not know, but I think we can
take for granted that write buffers exist.

Now, if you really care about your data integrity with a write-buffering
disk, I suspect that you'd better have an UPS. At which point write
buffering is a valid optimization, as long as you trust the harddisk
itself not to crash even if the OS were to crash.

Of course, whether you should even trust the harddisk is another question.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Jonathan Morton

I don't know if there is any way to turn of a write buffer on an IDE disk.

hdparm has an option of this nature, but it makes no difference (as I
reported).  It's worth noting that even turning off UDMA to the disk on my
machine doesn't help the situation - although it does slow things down a
little, it's not "slow enough" to indicate that the drive is behaving
properly.  Might be worth running the test on some of my other machines,
with their diverse collection of IDE controllers (mostly non-UDMA) and
disks.

Of course, whether you should even trust the harddisk is another question.

I think this result in itself would lead me *not* to trust the hard disk,
especially an IDE one.  Has anybody tried running this test with a recent
IBM DeskStar - one of the ones that is the same mech as the equivalent
UltraStar but with IDE controller?  I only have SCSI and laptop IBMs here -
all my desktop IDE drives are Seagate.  However I do have one SCSI Seagate,
which might be worth firing up for the occasion...

--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
big-mail: [EMAIL PROTECTED]
uni-mail: [EMAIL PROTECTED]

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-BEGIN GEEK CODE BLOCK-
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r- y+
-END GEEK CODE BLOCK-


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andre Hedrick

On Tue, 6 Mar 2001, Jonathan Morton wrote:

 It's pretty clear that the IDE drive(r) is *not* waiting for the physical
 write to take place before returning control to the user program, whereas
 the SCSI drive(r) is.  Both devices appear to be performing the write

Wrong, IDE does not unplug thus the request is almost, I hate to admit it
SYNC and not ASYNC :-(  Thus if the drive acks that it has the data then
the driver lets go.

 immediately, however (judging from the device activity lights).  Whether
 this is the correct behaviour or not, I leave up to you kernel hackers...

Seagate has a better seek profile than ibm.
The second access is correct because the first one pushed the heads to the
pre-seek.  Thus the question is were is the drive leaving the heads when
not active?  It does not appear to be in the zone 1 region.

 IMHO, if an application needs performance, it shouldn't be syncing disks
 after every write.  Syncing means, in my book, "wait for the data to be
 committed to physical media" - note the *wait* involved there - so syncing
 should only be used where data integrity in the event of a system failure
 has a much higher importance than performance.

I have only gotten the drive makers in the past 6 months to committee to
actively updating the contents of the identify page to reflect reality.
Thus if your drive is one of those that does a stress test check that goes:
"this bozo did not really mean to turn off write caching, renabling smurk"

Cheers,

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-05 Thread Andre Hedrick

On Mon, 5 Mar 2001, Linus Torvalds wrote:

 Well, it's fairly hard for the kernel to do much about that - it's almost
 certainly just IDE doing write buffering on the disk itself. No OS
 involved.

I am pushing for WC to be defaulted in the off state, but as you know I
have a bigger fight than caching on my hands...

 I don't know if there is any way to turn of a write buffer on an IDE disk.

You want a forced set of commands to kill caching at init?

Andre Hedrick
Linux ATA Development
ASL Kernel Development
-
ASL, Inc. Toll free: 1-877-ASL-3535
1757 Houret Court Fax: 1-408-941-2071
Milpitas, CA 95035Web: www.aslab.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: scsi vs ide performance on fsync's

2001-03-04 Thread Ishikawa
Douglas Gilbert wrote:

> There is definitely something strange going on here.
> As the bonnie test below shows, the SCSI disk used
> for my tests should vastly outperform the old IDE one:

First thank you and others with my clueless investigation about
the module loading under Debian GNU/Linux. (I should have known
that Debian uses a very special module setup.)

Anyway, I used to think SCSI is better than IDE in general, and
the post was quite surprising.
So I ran the test on my PC.
On my systems too, the IDE beats SCSI hand down with the test case.

BTW, has anyone noticed that
the elapsed time of SCSI case is TWICE as long if
we let the previous output of the test program stay before
running the second test? (I suspect fdatasync
takes time proportional to the (then current)  file size, but
still why SCSI case is so long is beyond me.)

Eg.

ishikawa@duron$ ls -l /tmp/t.out
ls: /tmp/t.out: No such file or directory
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real0m38.673s<=== my scsi disk is slow one to begin with...
user0m0.050s
sys 0m0.140s
ishikawa@duron$ ls -l /tmp/t.out
-rw-r--r--1 ishikawa users  112000 Mar  5 06:19 /tmp/t.out
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real1m16.928s<=== See TWICE as long!
user0m0.060s
sys 0m0.160s
ishikawa@duron$ ls -l /tmp/t.out
-rw-r--r--1 ishikawa users  112000 Mar  5 06:20 /tmp/t.out
ishikawa@duron$ rm /tmp/t.out< REMOVE the file and try again.
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real0m40.667s   < Half as long and back to original.
user0m0.040s
sys 0m0.120s
iishikawa@duron$ time ./xlog /tmp/t.out xxx

real0m0.012s  <=== very fast without fdatasync as it should be.
user0m0.010s
sys 0m0.010s
ishikawa@duron$


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >