Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread James Sutherland

On Sat, 16 Sep 2000, Giuliano Pochini wrote:

> I wrote, then got air-brushed out of the thread?!
> > That's one approach; I prefer my "weighted scoring" approach. Supposing we
> > have three devices: a solid state disk (instant "seeks"), a hard drive and
> > a tape. The SSD will benefit from merges (fewer commands to process), but
> > not hugely - set both the metrics at 1, so a 64Kb request is just under
> > twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
> > but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
> > single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
> > hugely so. Then the tape: seeks can take a few minutes, so make it
> > seek_cost = 65536, block_cost = 1: we don't really care how much is being
> > read, within reason, since seeks are so "expensive".
> 
> If we could read by a SCSI command the maximun/typical/minimum seek and
> transfer speed directly from the drive things were a lot simpler :))

We can almost do that with any device; if we have a nice interface for
user-space tuning (a few /proc files), we can be told at boot-time what
the characteristics of each device are. A simple piece of code which does
a couple of seeks and a big read or two on each device would achieve this
easily enough.

We don't need to know the exact characteristics, but it could be of some
use in fine-tuning performance; my overall approach should work reasonably
well for any given device, I think?


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Giuliano Pochini


> The current HUGE queue size is probably another reason for
> the very bad latencies we sometimes see...

Yes, but if the queue is short processes are likely to remain
blocked in D state for more time and the chances to merge rq
are smaller.
IMHO we should add a way to give priority to rqs from live
tasks. i.e. a "cp" usually puts stuff on the queue and then
exits. There's no need to service that i/o immediately.

> > I don't think you should allow merges. If one process is doing a big
> > IO operation on a big file it would still get _all_ capacity, right?
>
> Only if you were to allow unlimited merges ;)
>
> I guess a 'decent' maximum size for one request would
> be somewhere around 256kB, since this is the amount of
> data a drive can read in the time of one disk seek...

Today's HDs transfer rates of 10+MB/s are not uncommon. 256KB is a bit
too small IMHO. SCSI discs that don't support TQ (e.g. quantum) get a
performance hit.

Bye.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Giuliano Pochini




> That's one approach; I prefer my "weighted scoring" approach. Supposing we
> have three devices: a solid state disk (instant "seeks"), a hard drive and
> a tape. The SSD will benefit from merges (fewer commands to process), but
> not hugely - set both the metrics at 1, so a 64Kb request is just under
> twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
> but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
> single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
> hugely so. Then the tape: seeks can take a few minutes, so make it
> seek_cost = 65536, block_cost = 1: we don't really care how much is being
> read, within reason, since seeks are so "expensive".

If we could read by a SCSI command the maximun/typical/minimum seek and
transfer speed directly from the drive things were a lot simpler :))

Bye.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
> If we played some "zippy" music that would add to the "feel". Of
> course, we could actually use benchmarks instead.

Benchmarks, famed for their universal appeal :-)

> And, to me, if kernel compiles take longer, I don't care how fast it "feels".

A kernel compile _by itself_ should always run at full speed.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread yodaiken

On Sat, Sep 16, 2000 at 02:15:16PM +0200, Jamie Lokier wrote:
> [EMAIL PROTECTED] wrote:
> > > Sure the global system is slower.  But the "interactive feel" is faster.
> > 
> > Let's pop up little buttons to make it "feel" faster.
> 
> If little buttons pop up quickly when I click on them, then yes that's
> better interactive feel.  Sometimes the disk is involved in this.
> 
> Nobody has a problem with tuning the task scheduler to work this way.

If we played some "zippy" music that would add to the "feel". Of course, we could 
actually
use benchmarks instead. 
And, to me, if kernel compiles take longer, I don't care how fast it "feels".

> 
> -- Jamie

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
> > Sure the global system is slower.  But the "interactive feel" is faster.
> 
> Let's pop up little buttons to make it "feel" faster.

If little buttons pop up quickly when I click on them, then yes that's
better interactive feel.  Sometimes the disk is involved in this.

Nobody has a problem with tuning the task scheduler to work this way.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
  Sure the global system is slower.  But the "interactive feel" is faster.
 
 Let's pop up little buttons to make it "feel" faster.

If little buttons pop up quickly when I click on them, then yes that's
better interactive feel.  Sometimes the disk is involved in this.

Nobody has a problem with tuning the task scheduler to work this way.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Giuliano Pochini




 That's one approach; I prefer my "weighted scoring" approach. Supposing we
 have three devices: a solid state disk (instant "seeks"), a hard drive and
 a tape. The SSD will benefit from merges (fewer commands to process), but
 not hugely - set both the metrics at 1, so a 64Kb request is just under
 twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
 but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
 single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
 hugely so. Then the tape: seeks can take a few minutes, so make it
 seek_cost = 65536, block_cost = 1: we don't really care how much is being
 read, within reason, since seeks are so "expensive".

If we could read by a SCSI command the maximun/typical/minimum seek and
transfer speed directly from the drive things were a lot simpler :))

Bye.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread James Sutherland

On Sat, 16 Sep 2000, Giuliano Pochini wrote:

 I wrote, then got air-brushed out of the thread?!
  That's one approach; I prefer my "weighted scoring" approach. Supposing we
  have three devices: a solid state disk (instant "seeks"), a hard drive and
  a tape. The SSD will benefit from merges (fewer commands to process), but
  not hugely - set both the metrics at 1, so a 64Kb request is just under
  twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
  but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
  single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
  hugely so. Then the tape: seeks can take a few minutes, so make it
  seek_cost = 65536, block_cost = 1: we don't really care how much is being
  read, within reason, since seeks are so "expensive".
 
 If we could read by a SCSI command the maximun/typical/minimum seek and
 transfer speed directly from the drive things were a lot simpler :))

We can almost do that with any device; if we have a nice interface for
user-space tuning (a few /proc files), we can be told at boot-time what
the characteristics of each device are. A simple piece of code which does
a couple of seeks and a big read or two on each device would achieve this
easily enough.

We don't need to know the exact characteristics, but it could be of some
use in fine-tuning performance; my overall approach should work reasonably
well for any given device, I think?


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
 If we played some "zippy" music that would add to the "feel". Of
 course, we could actually use benchmarks instead.

Benchmarks, famed for their universal appeal :-)

 And, to me, if kernel compiles take longer, I don't care how fast it "feels".

A kernel compile _by itself_ should always run at full speed.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-16 Thread yodaiken

On Sat, Sep 16, 2000 at 02:15:16PM +0200, Jamie Lokier wrote:
 [EMAIL PROTECTED] wrote:
   Sure the global system is slower.  But the "interactive feel" is faster.
  
  Let's pop up little buttons to make it "feel" faster.
 
 If little buttons pop up quickly when I click on them, then yes that's
 better interactive feel.  Sometimes the disk is involved in this.
 
 Nobody has a problem with tuning the task scheduler to work this way.

If we played some "zippy" music that would add to the "feel". Of course, we could 
actually
use benchmarks instead. 
And, to me, if kernel compiles take longer, I don't care how fast it "feels".

 
 -- Jamie

-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-15 Thread yodaiken

On Tue, Sep 12, 2000 at 04:30:32PM +0200, Jamie Lokier wrote:
> Sure the global system is slower.  But the "interactive feel" is faster.

Let's pop up little buttons to make it "feel" faster.


-- 
-
Victor Yodaiken 
Finite State Machine Labs: The RTLinux Company.
 www.fsmlabs.com  www.rtlinux.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-15 Thread Andrea Arcangeli

On Fri, Sep 15, 2000 at 09:08:25PM -0400, Giuliano Pochini wrote:
> Which is tightly dependent on the way we insert new rqs.

Sure the implementation would differ in function of the way we order the
requests during inserction, but the conceptual algorithm of the latency control
could remain the same even if we'll sligtly change the ordering algorithm.

The reason the implementation would differ is because they are implemented
together to browse the queue the less times possible.

> Rqs are added according to the IN_ORDER() condition which enforces ascending
> ordering. What does CSCAN mean ?

CSCAN means Cyclic scan. It enforces ascending order and when it reaches the
last cylinder it causes a large seek to the head.

> [..] When we insert 15 the list becomes 12 13 14 1 2 3 15.

You're right.

Infact the only reason we need the latency control stuff is that 14 have to
become a barrier eventually (and 15 have to become a barrier eventually as
well) if we want them to be served by the lowlevel driver otherwise they could
be passed for way too long time (just the time to write some Tera of data
to disk) by requests with lower blocknr. This was the elevator latency bugfix.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-15 Thread Giuliano Pochini


> >It's a good way to avoid stalls, but IMHO I thing the elevator should
>
> Here you're talking about the latency control logic.

Which is tightly dependent on the way we insert new rqs.

> >not work this way. The main problem is that it doesn't minimize head
> >movement. For example, when comes a request for a sector at the
>
> The algorithm that control the latency and the one that enforce the
> ordering are two separate things.
>
> For example in 2.2.15 there's only the latter (2.2.15 was missing a
> latency control mechanism).
>
> >beginning of the disk it goes to the beginning of the queue (it's
> >ordered) so it will be serviced before other rqs (it picks up rqs from
> >the head). This way the higher is the block number, more likely that
> >rq will wait a lot.
>
> We're not using a SCAN ordering algorithm it's instead a CSCAN. The queue
> can be for istance:
>
> 12 13 14 1 2 3
>
> Now insert 15 (assume 15 is the last sector of the disk):
>
> 12 13 14 15 1 2 3

Rqs are added according to the IN_ORDER() condition which enforces ascending
ordering. What does CSCAN mean ?
ie, if the queue initially is  1 2 3 it will become 1 2 3 14.
If it's 12 13 14 and we insert 2: 2 12 13 14.
So your example can happen only if the list is 12 13 14 and "14" is the
latency barrier. When we insert 15 the list becomes 12 13 14 1 2 3 15.
Am I missing something ?

> PS. Thanks for having read and understood the internals of the 2.4.x
> elevator before posting your comments, it simplifies very much the
> communication.

I had to study it while hunting for the no-merge bug :)

Bye.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-15 Thread Giuliano Pochini


 It's a good way to avoid stalls, but IMHO I thing the elevator should

 Here you're talking about the latency control logic.

Which is tightly dependent on the way we insert new rqs.

 not work this way. The main problem is that it doesn't minimize head
 movement. For example, when comes a request for a sector at the

 The algorithm that control the latency and the one that enforce the
 ordering are two separate things.

 For example in 2.2.15 there's only the latter (2.2.15 was missing a
 latency control mechanism).

 beginning of the disk it goes to the beginning of the queue (it's
 ordered) so it will be serviced before other rqs (it picks up rqs from
 the head). This way the higher is the block number, more likely that
 rq will wait a lot.

 We're not using a SCAN ordering algorithm it's instead a CSCAN. The queue
 can be for istance:

 12 13 14 1 2 3

 Now insert 15 (assume 15 is the last sector of the disk):

 12 13 14 15 1 2 3

Rqs are added according to the IN_ORDER() condition which enforces ascending
ordering. What does CSCAN mean ?
ie, if the queue initially is  1 2 3 it will become 1 2 3 14.
If it's 12 13 14 and we insert 2: 2 12 13 14.
So your example can happen only if the list is 12 13 14 and "14" is the
latency barrier. When we insert 15 the list becomes 12 13 14 1 2 3 15.
Am I missing something ?

 PS. Thanks for having read and understood the internals of the 2.4.x
 elevator before posting your comments, it simplifies very much the
 communication.

I had to study it while hunting for the no-merge bug :)

Bye.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-15 Thread Andrea Arcangeli

On Fri, Sep 15, 2000 at 09:08:25PM -0400, Giuliano Pochini wrote:
 Which is tightly dependent on the way we insert new rqs.

Sure the implementation would differ in function of the way we order the
requests during inserction, but the conceptual algorithm of the latency control
could remain the same even if we'll sligtly change the ordering algorithm.

The reason the implementation would differ is because they are implemented
together to browse the queue the less times possible.

 Rqs are added according to the IN_ORDER() condition which enforces ascending
 ordering. What does CSCAN mean ?

CSCAN means Cyclic scan. It enforces ascending order and when it reaches the
last cylinder it causes a large seek to the head.

 [..] When we insert 15 the list becomes 12 13 14 1 2 3 15.

You're right.

Infact the only reason we need the latency control stuff is that 14 have to
become a barrier eventually (and 15 have to become a barrier eventually as
well) if we want them to be served by the lowlevel driver otherwise they could
be passed for way too long time (just the time to write some Tera of data
to disk) by requests with lower blocknr. This was the elevator latency bugfix.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-14 Thread Hans Reiser

Ragnar Kjørstad wrote:
> 
> On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
> > If I may ask a potentially stupid question, how can request latency be
> > anything but a factor of time?  Latency is how /long/ you (or the computer)
> > /waits/ for something.  That defines it as a function of time.
> 
> Latency is of course a factor of time, but the point is that the
> acceptable latency differs from device to device. For a slower device
> longer latency must be acceptable, and if the relationship is linear,
> then using number of requests may be a simpler and better way of doing
> it.
>

Latency is a function of time, but the units of time need not be seconds and could be 
requests.
In theory.  In this particular case, measuring time in units of requests is 
inappropriate.

We have a choice between having users (or device driver authors
for the defaults) estimate how many milliseconds is reasonable for a given device, or 
having them
estimate how many requests can be guaranteed to not exceed the time they can wait for 
a particular
device.  I would rather take responsibility for knowing not to get unreasonable in the 
smallness of
my MP3 buffering for a floppy device experiencing multiple processes accessing it for 
reasons of
performance effects being disastrous than take responsibility for estimating how many 
requests will
not
add up to more than X milliseconds of buffer on my hard drive.

Number of requests is irrelevant to realtime I/O scheduling, seconds are relevant to 
realtime
I/O scheduling.

There are two problems to be solved: fairness, and realtime I/O scheduling.  Fairness 
can be handled
effectively by measuring time in units of requests.  Guaranteeing maximum latencies 
requires
traditional time units.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-14 Thread James Sutherland

On Wed, 13 Sep 2000, Rik van Riel wrote:

> On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
> This (IMHO) is wrong. When the drive has trouble keeping up
> with requests in FCFS order, we should do some elevator sorting
> to keep throughput high enough to deal with the requests we get.

Reversing that, you mean the drive should be given requests on an FCFS
basis until it is running flat out, at which point elevator sorting should
come into play? That's what I would suggest: under light loads, requests
are then handled with no extra latency at all, but we still get increased
throughput under heavier load.

> Jeff's idea would take care of this, together with a *limited*
> (say, 32 entries) queue size. Every time the "work" queue (the
> one the disk is working on) is empty, we switch queues.

Then process one queue until it's empty, even if there are higher priority
requests in another queue? I'm not so sure about that: then my backup
program or a file copy will be getting between my MP3 player and the disk.

> Processes get to put their request on the other queue, either
> until the disk takes their requests (the queues are switched)
> or until the queue is full. That way only the N highest priority
> processes can get access to the disk if we're really overloaded
> in a bad way.

How does your approach block lower priority processes? If they were to
issue a request before a higher priority one, it is inserted into the
"filling" queue; once we switch queues, this request will be handled -
perhaps before a higher priority request, if we're unlucky with the
timing?

If we can keep it sorted reasonably efficiently, a single rolling queue
seems better - or perhaps model it on the process scheduler? Have
"realtime" requests ("it doesn't matter if everything else locks up, this
request MUST be completed ASAP"), then "other" requests of various
priorities?

> The current HUGE queue size is probably another reason for
> the very bad latencies we sometimes see...

If the code ever sits there sorting the queue out, while the disk goes
idle, the code needs adjusting: getting requests processed in a
sub-optimal order is still better than not getting any requests processed
at all for some period of time!

> > > (maybe with the twist that we /do/ allow merges on the queue
> > > that's being processed by the disk, but no insertions of new
> > > requests)
> > 
> > I don't think you should allow merges. If one process is doing a big 
> > IO operation on a big file it would still get _all_ capacity, right?
> 
> Only if you were to allow unlimited merges ;)
> 
> I guess a 'decent' maximum size for one request would
> be somewhere around 256kB, since this is the amount of
> data a drive can read in the time of one disk seek...

That's one approach; I prefer my "weighted scoring" approach. Supposing we
have three devices: a solid state disk (instant "seeks"), a hard drive and
a tape. The SSD will benefit from merges (fewer commands to process), but
not hugely - set both the metrics at 1, so a 64Kb request is just under
twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
hugely so. Then the tape: seeks can take a few minutes, so make it
seek_cost = 65536, block_cost = 1: we don't really care how much is being
read, within reason, since seeks are so "expensive".


In short, having a single rolling queue (probably divided into priority
classes) seems preferable; we should also bypass the elevator entirely if
the device is able to accept new commands at the time. (Q: how
intelligently will drives which handle command queueing themselves be??)


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-14 Thread James Sutherland

On Wed, 13 Sep 2000, Rik van Riel wrote:

 On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
 This (IMHO) is wrong. When the drive has trouble keeping up
 with requests in FCFS order, we should do some elevator sorting
 to keep throughput high enough to deal with the requests we get.

Reversing that, you mean the drive should be given requests on an FCFS
basis until it is running flat out, at which point elevator sorting should
come into play? That's what I would suggest: under light loads, requests
are then handled with no extra latency at all, but we still get increased
throughput under heavier load.

 Jeff's idea would take care of this, together with a *limited*
 (say, 32 entries) queue size. Every time the "work" queue (the
 one the disk is working on) is empty, we switch queues.

Then process one queue until it's empty, even if there are higher priority
requests in another queue? I'm not so sure about that: then my backup
program or a file copy will be getting between my MP3 player and the disk.

 Processes get to put their request on the other queue, either
 until the disk takes their requests (the queues are switched)
 or until the queue is full. That way only the N highest priority
 processes can get access to the disk if we're really overloaded
 in a bad way.

How does your approach block lower priority processes? If they were to
issue a request before a higher priority one, it is inserted into the
"filling" queue; once we switch queues, this request will be handled -
perhaps before a higher priority request, if we're unlucky with the
timing?

If we can keep it sorted reasonably efficiently, a single rolling queue
seems better - or perhaps model it on the process scheduler? Have
"realtime" requests ("it doesn't matter if everything else locks up, this
request MUST be completed ASAP"), then "other" requests of various
priorities?

 The current HUGE queue size is probably another reason for
 the very bad latencies we sometimes see...

If the code ever sits there sorting the queue out, while the disk goes
idle, the code needs adjusting: getting requests processed in a
sub-optimal order is still better than not getting any requests processed
at all for some period of time!

   (maybe with the twist that we /do/ allow merges on the queue
   that's being processed by the disk, but no insertions of new
   requests)
  
  I don't think you should allow merges. If one process is doing a big 
  IO operation on a big file it would still get _all_ capacity, right?
 
 Only if you were to allow unlimited merges ;)
 
 I guess a 'decent' maximum size for one request would
 be somewhere around 256kB, since this is the amount of
 data a drive can read in the time of one disk seek...

That's one approach; I prefer my "weighted scoring" approach. Supposing we
have three devices: a solid state disk (instant "seeks"), a hard drive and
a tape. The SSD will benefit from merges (fewer commands to process), but
not hugely - set both the metrics at 1, so a 64Kb request is just under
twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
hugely so. Then the tape: seeks can take a few minutes, so make it
seek_cost = 65536, block_cost = 1: we don't really care how much is being
read, within reason, since seeks are so "expensive".


In short, having a single rolling queue (probably divided into priority
classes) seems preferable; we should also bypass the elevator entirely if
the device is able to accept new commands at the time. (Q: how
intelligently will drives which handle command queueing themselves be??)


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-14 Thread Hans Reiser

Ragnar Kjørstad wrote:
 
 On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
  If I may ask a potentially stupid question, how can request latency be
  anything but a factor of time?  Latency is how /long/ you (or the computer)
  /waits/ for something.  That defines it as a function of time.
 
 Latency is of course a factor of time, but the point is that the
 acceptable latency differs from device to device. For a slower device
 longer latency must be acceptable, and if the relationship is linear,
 then using number of requests may be a simpler and better way of doing
 it.


Latency is a function of time, but the units of time need not be seconds and could be 
requests.
In theory.  In this particular case, measuring time in units of requests is 
inappropriate.

We have a choice between having users (or device driver authors
for the defaults) estimate how many milliseconds is reasonable for a given device, or 
having them
estimate how many requests can be guaranteed to not exceed the time they can wait for 
a particular
device.  I would rather take responsibility for knowing not to get unreasonable in the 
smallness of
my MP3 buffering for a floppy device experiencing multiple processes accessing it for 
reasons of
performance effects being disastrous than take responsibility for estimating how many 
requests will
not
add up to more than X milliseconds of buffer on my hard drive.

Number of requests is irrelevant to realtime I/O scheduling, seconds are relevant to 
realtime
I/O scheduling.

There are two problems to be solved: fairness, and realtime I/O scheduling.  Fairness 
can be handled
effectively by measuring time in units of requests.  Guaranteeing maximum latencies 
requires
traditional time units.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 08:08:12PM -0300, Rik van Riel wrote:
> On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
> > On Wed, Sep 13, 2000 at 06:57:21PM -0300, Rik van Riel wrote:
> > > > Another potentially stupid question:
> > > > When the queue gets too long/old, new requests should be put in
> > > > a new queue to avoid starvation for the ones in the current
> > > > queue, right?
> > > 
> > > Indeed. That would solve the problem...
> > 
> > I think something like this: (I don't know the current code, so maybe
> > this suggestion is really stupid.)
> > 
> > struct request_queue {
> > int4start_time;
> > struct request *list;
> > struct request_queue *next;
> > }
> > 
> > struct request_queue *current_queue;
> > 
> > function too_old (struct request_queue *q) {
> > /* can actually easily be implemented using time, nr of requests
> >or any other messure of amount of work */
> > if (current_time > q->start_time + MAXTIME)
> > return 1;
> > else
> > return 0;
> > }
> > 
> > function insert_request(struct request *new) {
> > struct request_queue *q=current_queue;
> > if (new->sectornr < current_sector);
> > q=q->next;
> > while (too_old(q))
> > q=q->next;
> > insert_request(q->list, new);
> > }
> 
> This (IMHO) is wrong. When the drive has trouble keeping up
> with requests in FCFS order, we should do some elevator sorting
> to keep throughput high enough to deal with the requests we get.

The "list" in each queue should be sorted. So, even if the queue is
too old, the requests will still be sorted, but only within each queue.

> Jeff's idea would take care of this, together with a *limited*
> (say, 32 entries) queue size. Every time the "work" queue (the
> one the disk is working on) is empty, we switch queues.

OK, I called the "work" queue my "current" queue in the suggestion
above, but other than that it may not be so different.

> Processes get to put their request on the other queue, either
> until the disk takes their requests (the queues are switched)
> or until the queue is full. That way only the N highest priority
> processes can get access to the disk if we're really overloaded
> in a bad way.

I'm not sure I understand this.

New requests should always be put in the first queue that has room (is
not too old). This means that no requests will be shifted too far off
it's original order. (in other words, it is not too unfair)

> > I don't think you should allow merges. If one process is doing a big 
> > IO operation on a big file it would still get _all_ capacity, right?
> 
> Only if you were to allow unlimited merges ;)
> 
> I guess a 'decent' maximum size for one request would
> be somewhere around 256kB, since this is the amount of
> data a drive can read in the time of one disk seek...

Yes, this sounds like a good idea. (Don't know about the size though,
but testing will show)


-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Rik van Riel

On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
> On Wed, Sep 13, 2000 at 06:57:21PM -0300, Rik van Riel wrote:
> > > Another potentially stupid question:
> > > When the queue gets too long/old, new requests should be put in
> > > a new queue to avoid starvation for the ones in the current
> > > queue, right?
> > 
> > Indeed. That would solve the problem...
> 
> I think something like this: (I don't know the current code, so maybe
> this suggestion is really stupid.)
> 
> struct request_queue {
>   int4start_time;
>   struct request *list;
>   struct request_queue *next;
> }
> 
> struct request_queue *current_queue;
> 
> function too_old (struct request_queue *q) {
>   /* can actually easily be implemented using time, nr of requests
>  or any other messure of amount of work */
>   if (current_time > q->start_time + MAXTIME)
>   return 1;
>   else
>   return 0;
> }
> 
> function insert_request(struct request *new) {
>   struct request_queue *q=current_queue;
>   if (new->sectornr < current_sector);
>   q=q->next;
>   while (too_old(q))
>   q=q->next;
>   insert_request(q->list, new);
> }

This (IMHO) is wrong. When the drive has trouble keeping up
with requests in FCFS order, we should do some elevator sorting
to keep throughput high enough to deal with the requests we get.

Jeff's idea would take care of this, together with a *limited*
(say, 32 entries) queue size. Every time the "work" queue (the
one the disk is working on) is empty, we switch queues.

Processes get to put their request on the other queue, either
until the disk takes their requests (the queues are switched)
or until the queue is full. That way only the N highest priority
processes can get access to the disk if we're really overloaded
in a bad way.

The current HUGE queue size is probably another reason for
the very bad latencies we sometimes see...

> > (maybe with the twist that we /do/ allow merges on the queue
> > that's being processed by the disk, but no insertions of new
> > requests)
> 
> I don't think you should allow merges. If one process is doing a big 
> IO operation on a big file it would still get _all_ capacity, right?

Only if you were to allow unlimited merges ;)

I guess a 'decent' maximum size for one request would
be somewhere around 256kB, since this is the amount of
data a drive can read in the time of one disk seek...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 06:57:21PM -0300, Rik van Riel wrote:
> > Another potentially stupid question:
> > When the queue gets too long/old, new requests should be put in
> > a new queue to avoid starvation for the ones in the current
> > queue, right?
> 
> Indeed. That would solve the problem...

I think something like this: (I don't know the current code, so maybe
this suggestion is really stupid.)

struct request_queue {
int4start_time;
struct request *list;
struct request_queue *next;
}

struct request_queue *current_queue;

function too_old (struct request_queue *q) {
/* can actually easily be implemented using time, nr of requests
   or any other messure of amount of work */
if (current_time > q->start_time + MAXTIME)
return 1;
else
return 0;
}

function insert_request(struct request *new) {
struct request_queue *q=current_queue;
if (new->sectornr < current_sector);
q=q->next;
while (too_old(q))
q=q->next;
insert_request(q->list, new);
}

> (maybe with the twist that we /do/ allow merges on the queue
> that's being processed by the disk, but no insertions of new
> requests)

I don't think you should allow merges. If one process is doing a big 
IO operation on a big file it would still get _all_ capacity, right?



-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 08:08:51PM -0400, Giuliano Pochini wrote:
> > And if so, in what unit do you want to measure
> > latency if it isn't time?
> 
> I had a look at the new elevator. I hope I'm not wrong here... Requests
> are added to the queue scanning it starting from the tail. When the
> new request if found in-order with ad existing one, it's inserted int
> the queue, unless it finds an existing request out of latency units. In
> that case the new rq is added before it (so it will be serviced after).

If I understand this correctly, that means once the queue is too long
(latency wise) the elevator will stop working completely, because all
requests will be served in the order they come in. right?

If multiple queues are used, so performance is OK even when the first
queue is full (too old), the discussion about how to tell if the queue
is too old or not (by time or requests) becomes less important. After
all, there is no reason not to start a new queue pretty often, even on a
fast device.

The correct behavoir would be to reorderder the new requests
individually even if the queue is alreaddy too long. In other words -
multiple queues.

> not work this way. The main problem is that it doesn't minimize head
> movement. For example, when comes a request for a sector at the
> beginning of the disk it goes to the beginning of the queue (it's
> ordered) so it will be serviced before other rqs (it picks up rqs from
> the head). This way the higher is the block number, more likely that
> rq will wait a lot.

If a new request is made for a sector before the current sector (the
sector of the last request to be served), the new request should be
added to the second queue even if the first one is not too old.



-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Rik van Riel

On Wed, 13 Sep 2000, Ragnar Kjørstad wrote:
> On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
> > If I may ask a potentially stupid question, how can request latency be
> > anything but a factor of time?  Latency is how /long/ you (or the computer)
> > /waits/ for something.  That defines it as a function of time.
> 
> Latency is of course a factor of time, but the point is that the
> acceptable latency differs from device to device. For a slower device
> longer latency must be acceptable, and if the relationship is linear,
> then using number of requests may be a simpler and better way of doing
> it.
> 
> Another potentially stupid question:
> When the queue gets too long/old, new requests should be put in
> a new queue to avoid starvation for the ones in the current
> queue, right?

Indeed. That would solve the problem...

> So if this is done by time, how do you know when the oldest
> request get too old? You would need to index the requests both
> by sector and time, and thus performance overhead, right?

> If you, however have a simple rule that max 100 requests should
> be put in each queue, it's easy to know when to start a new one.

The idea Jeff Merkey gave us is even simpler and should work
ok every time. I think I'll implement it RSN and try if it
works ok for Linux.

(maybe with the twist that we /do/ allow merges on the queue
that's being processed by the disk, but no insertions of new
requests)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
> If I may ask a potentially stupid question, how can request latency be
> anything but a factor of time?  Latency is how /long/ you (or the computer)
> /waits/ for something.  That defines it as a function of time.

Latency is of course a factor of time, but the point is that the
acceptable latency differs from device to device. For a slower device
longer latency must be acceptable, and if the relationship is linear,
then using number of requests may be a simpler and better way of doing
it.


Another potentially stupid question:
When the queue gets too long/old, new requests should be put in a new
queue to avoid starvation for the ones in the current queue, right?
So if this is done by time, how do you know when the oldest request get
too old? You would need to index the requests both by sector and time,
and thus performance overhead, right?
If you, however have a simple rule that max 100 requests should be put
in each queue, it's easy to know when to start a new one. The number 100
should be found by calculating how many requests can be served within
the acceptable latency.


-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Andrew Scott

On 12 Sep 2000, at 18:08, Ed Tomlinson wrote:

> Hi,
> 
> I made the comment because I remember back when the discussion was current
> on linux kernel.  I thought Jeff Merkey's, message was to the point.  Para-
> phrasing from memory, it was something to the effect that novell had 
> tried many elevators.  All had problems with some loads.  The best they
> had found was the 'close the door' idea.  I do not remember if the door
> was based on requests or time.  Another point to remember is that the 
> netware people came up with a what they considered a good solution.  

I believe that the Netware elevator is based on outstanding requests. 
This is a tunable parameter which may be increased for fast disk 
subsystems.

I think that you could consider the number of requests to be loosely 
related to time. That is, the time to service 50 requests should be 
fairly predictable for a given disk/controller. I don't think you 
need to time stamp every request to get good results.


--Mailed via Pegasus 3.12 & Mercury 1.44---
[EMAIL PROTECTED]Fax (617)373-2942
Andrew Scott   Tel (617)373-5278   _
Northeastern University--138 Meserve Hall / \   /
College of Arts & Sciences-Deans Office  / \ \ /
Boston, Ma. 02115   /   \_/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Giuliano Pochini


> > Going in function of time is obviously wrong. A blockdevice can
> > write 1 request every two seconds or 1 request every msecond.
> > You can't assume anything in function of time _unless_ you have
> > per harddisk timing informations into the kernel.
>
> Uhmmm, isn't the elevator about request /latency/ ?
>
> And if so, in what unit do you want to measure
> latency if it isn't time?

I had a look at the new elevator. I hope I'm not wrong here... Requests
are added to the queue scanning it starting from the tail. When the
new request if found in-order with ad existing one, it's inserted int
the queue, unless it finds an existing request out of latency units. In
that case the new rq is added before it (so it will be serviced after).
After it have inserted the request it walks back and decrease all
latency counters.
So, the "unit" to measure latency is "number of requests". New requests
are not allowed to go before an existing one more than N times.
It's a good way to avoid stalls, but IMHO I thing the elevator should
not work this way. The main problem is that it doesn't minimize head
movement. For example, when comes a request for a sector at the
beginning of the disk it goes to the beginning of the queue (it's
ordered) so it will be serviced before other rqs (it picks up rqs from
the head). This way the higher is the block number, more likely that
rq will wait a lot.
Again... If I understood correctly how the elevator works.

Bye.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Jeff V. Merkey


You're welcome.

:-)

Jeff

Hans Reiser wrote:
> 
> "Jeff V. Merkey" wrote:
> >
> > One important point on remirroring I did not mention in my post.  In
> > NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
> > artificial starvation while remirring is going on.  This was another
> > optimization we learned the hard way by trying numerous approaches to
> > the problem.
> >
> > Jeff
> 
> Interesting.  Thanks Jeff.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Rogier Wolff

> "Jeff V. Merkey" wrote:
> > 
> > One important point on remirroring I did not mention in my post.  In
> > NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
> > artificial starvation while remirring is going on.  This was another
> > optimization we learned the hard way by trying numerous approaches to
> > the problem.

Don't you want to do it a megabyte at a time to prevent abyssimal
disk-performance? (i.e. take the separate megabytes backwards, but do
every megabyte forwards)

instead of 
for (b=max;b>0;b--) 
do 
   for (bb=max-STEP;bb > 0 ; bb -= STEP)
for (b=bb;bhttp://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
*   Common sense is the collection of*
**  prejudices acquired by age eighteen.   -- Albert Einstein 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Michael T. Babcock

- Original Message -
From: "Rik van Riel" <[EMAIL PROTECTED]>


> On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> > On Tue, 12 Sep 2000, Rik van Riel wrote:
> >
> > >Uhmmm, isn't the elevator about request /latency/ ?
> >
> > Yes, but definitely not absolute "time" latency.
>
> So what do you think about the idea Jeff Merkey
> presented? It seems to work well for Netware, so
> maybe we should try for Linux?

If I may ask a potentially stupid question, how can request latency be
anything but a factor of time?  Latency is how /long/ you (or the computer)
/waits/ for something.  That defines it as a function of time.

Just curious what the full comment being made is ...
... because I'm suspicious that the terms are being confused and there is
some discussion about reducing drive access latency within the context of
not moving the heads too much, etc. without the context of the user-space
application waiting for data.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Andrea Arcangeli wrote:

> On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:
> 
> >The "large queue" goes against the whole point of this exercise - that
> >is that if there are many items in the "queue" being sorted then
> >unlucky requests can end up waiting a long time to get serviced.
> 
> Yep.

Ehmm... that large queue is a backlog of I/O requests. Since that's a
function of the workload and the disk speed, you can't reduce it without
either cutting the workload or speeding up the disk!

I suspect we were talking at cross purposes here a little. I was NOT
suggesting holding back requests to build up a big queue - I was talking
about the situation where we have a large backlog of requests pending.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:

> James Sutherland wrote:
> > In terms of latency, I'd suggest we aim to keep the device in use all the
> > time we have outstanding requests: every time the device is ready to
> > accept a request, we feed it the "next" one in the queue; until it is free
> > again, requests pile up in the queue, being sorted, merged etc.
> > 
> > This should perform very well under light loads - requests just get passed
> > to the device as fast as they can be handled - and under very heavy loads
> > - we have a large queue in play, with plenty of scope for reordering,
> > merges etc.
> 
> The "large queue" goes against the whole point of this exercise - that
> is that if there are many items in the "queue" being sorted then
> unlucky requests can end up waiting a long time to get serviced.

Yes, you need some sort of priority boost to avoid starving unlucky
processes. Tracking a sort of I/O goodness value may help here: if I fire
off a copy of a large file, and a "find /", the former will get much
higher throughput due to large consecutive reads. As it does so, though,
it reduces its own I/O priority; 'eventually' it will be below find, and
find gets some I/O bandwidth.

> You want to have some reasonable break-off point where you just
> decide to make an elevator swipe in order to avoid starving those
> requests.

Hrmm... not quite the system I had in mind. I specifically want to avoid
delaying any requests: when possible, we just pass requests straight to
the drive. The queue of outstanding requests only arises in the first
place when the drive is busy.

> Keep in mind that I'm using the word "queue" in a confusing manner
> here - we're talking about how much we'll try to elevator-sort in
> a go, not the actual queue of all requests.  I've been calling this
> a queue because from a performance standpoint that's what it is,
> so I think it helps to think of the problem that way.

Ahh... I see. If we have a large enough backlog of requests that we can't
even elevator sort them fast enough, there's something very odd going
on. Still, I was planning to impose a maximum limit on the backlog - the
question is, what to do when that limit is reached??

> Oh yeah one other thing about huge elevator queues to think about
> (although its not too pertinant these days).  A friend of mine had
> a job 15 years or so ago to working on a commercial UNIX flavor.
> They were having some severe I/O performance problems under high
> loads.  The problem was that they were trying to sort and merge
> all the pending requests.  When I/O was heavy this ended up
> chewing all the CPU time and the kernel wouldn't actually be
> able to keep up sending requests to the disk :-)  However, keep
> in mind that this was int the bad-old-days when "hard drive"
> meant a rack-mount Eagle and the elevator was expected to consider
> things like head positioning and rotational speed when doing
> a sort...

LOL! My approach does avoid that, though. As a worst case, the elevator
just doesn't get a chance to sort anything: every request issues hits the
disk directly. If the disk is handling requests fast enough the CPU can't
keep up, we don't really want an elevator anyway...


So, in various scenarios:

* Light I/O - disk handling requests as they are issued. Disk may be idle
at times.

* Medium I/O - small backlog being elevator sorted as needed. Disk active
continuously, but all requests being handled OK.

* Heavy I/O - large backlog building up. Disk continuously active, some
processes may be starved or throttled, depending on priorities.


In the third case, sometimes we WILL want processes starved of I/O. If I
have three processes running - an MP3 player (high priority), a 'find' and
a low priority backup job, I DO want the backup job starved in favour of
the other two.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Hans Reiser

"Jeff V. Merkey" wrote:
> 
> One important point on remirroring I did not mention in my post.  In
> NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
> artificial starvation while remirring is going on.  This was another
> optimization we learned the hard way by trying numerous approaches to
> the problem.
> 
> Jeff

Interesting.  Thanks Jeff.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Andrea Arcangeli

On Wed, 13 Sep 2000, Martin Dalecki wrote:

>Andrea Arcangeli wrote:
>> 
>> On Tue, 12 Sep 2000, Martin Dalecki wrote:
>> 
>> >First of all: In the case of the mp3 player and such there is already a
>> >fine
>> >proper way to give it better chances on getting it's job done smooth -
>> >RT kernel sceduler priorities and proper IO buffering. I did something
>> >similiar
>> >to a GDI printer driver...
>> 
>> Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
>> than 128mbyte :) and then run in background `cp /dev/zero .` in the same
>> fs where your mp3 file out of cache is living. Then you'll see why a large
>> buffer is useless if there's none kind of I/O fair scheduling into the
>> elevator. Repeat the same test in 2.2.16 then.
>> 
>> The I/O latency Hans was taking about for the mp3 player, is the time it
>> takes for the buffer to become empty.
>
>I was talking about *proper* buffering not necessary *big* buffers.

Specificate "*proper* buffering".

If your buffer is shorter than the size of the file (and as it's a buffer
it's intended to be shorter as your mp3 file can be several gigabytes
long), then at some point you'll have to page in from disk the new data to
refill the buffer.

What your proper buffering will do when linux doesn't complete the read
I/O to refill the buffer in time? (in time == before the buffer become
empty)

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Andrea Arcangeli

On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:

>The "large queue" goes against the whole point of this exercise - that
>is that if there are many items in the "queue" being sorted then
>unlucky requests can end up waiting a long time to get serviced.

Yep.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Andrea Arcangeli

On Wed, 13 Sep 2000, Alan Cox wrote:

>> Yes, but "how hard is it reasonable for the kernel to try" is based on
>> both items.  A good first order approximation is number of requests.
>
>I must strongly disagree with that claim. A request could be 512 bytes or
>128K. 

Current elevator account the number of bh that enters the bh layer (we're 
not making difference between 1k and 4k softblocksizes though).

I was misleading (sorry) when I used the term "request", I didn't meant a
`struct request" but a new I/O request (1k/4k sized) pushed into the
blkdev layer.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

James Sutherland wrote:
> In terms of latency, I'd suggest we aim to keep the device in use all the
> time we have outstanding requests: every time the device is ready to
> accept a request, we feed it the "next" one in the queue; until it is free
> again, requests pile up in the queue, being sorted, merged etc.
> 
> This should perform very well under light loads - requests just get passed
> to the device as fast as they can be handled - and under very heavy loads
> - we have a large queue in play, with plenty of scope for reordering,
> merges etc.

The "large queue" goes against the whole point of this exercise - that
is that if there are many items in the "queue" being sorted then
unlucky requests can end up waiting a long time to get serviced.
You want to have some reasonable break-off point where you just
decide to make an elevator swipe in order to avoid starving those
requests.

Keep in mind that I'm using the word "queue" in a confusing manner
here - we're talking about how much we'll try to elevator-sort in
a go, not the actual queue of all requests.  I've been calling this
a queue because from a performance standpoint that's what it is,
so I think it helps to think of the problem that way.

Oh yeah one other thing about huge elevator queues to think about
(although its not too pertinant these days).  A friend of mine had
a job 15 years or so ago to working on a commercial UNIX flavor.
They were having some severe I/O performance problems under high
loads.  The problem was that they were trying to sort and merge
all the pending requests.  When I/O was heavy this ended up
chewing all the CPU time and the kernel wouldn't actually be
able to keep up sending requests to the disk :-)  However, keep
in mind that this was int the bad-old-days when "hard drive"
meant a rack-mount Eagle and the elevator was expected to consider
things like head positioning and rotational speed when doing
a sort...

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:

> Alan Cox wrote:
> > > Yes, but "how hard is it reasonable for the kernel to try" is based on
> > > both items.  A good first order approximation is number of requests.
> > 
> > I must strongly disagree with that claim. A request could be 512 bytes or
> > 128K. 
> 
> Yeah, as sct pointed out this gets thorny.  For a modern harddrive this
> probably doesn't matter (since sequential I/O is SO fast compared to
> seek times) but for other devices its an issue.

How about a simple "cost metric" for each request? (Each individual
request gets X "points", plus Y per block. Make Y almost zero for HDDs and
tape drives etc., make X and Y equal for solid state storage, say.)

In terms of latency, I'd suggest we aim to keep the device in use all the
time we have outstanding requests: every time the device is ready to
accept a request, we feed it the "next" one in the queue; until it is free
again, requests pile up in the queue, being sorted, merged etc.

This should perform very well under light loads - requests just get passed
to the device as fast as they can be handled - and under very heavy loads
- we have a large queue in play, with plenty of scope for reordering,
merges etc.

> > > ...where the user sets a number exlpicitly for what performance they
> > > want.  Again, if we're going to make the user set this latency
> > 
> > No they do not. The parameters are defined by the bandwidth and measured 
> > behaviour.
> 
> Hmmm... well if someone wants to propose an algorithm to self-tune the
> "queue depth in milliseconds" number then maybe we'll get somewhere.
> You'd need to do some sort of moving averages of both requests/sec and
> sectors/sec that come out of the elevator and use those as feedback to
> adjust the queue-depth-value.  I'm not sure if this is advisable, but
> at least it sounds feasable.

With the metrics above, you should be able to calculate appropriate values
for X and Y to make the "cost" figures roughly correspond to actual times,
I think? The big question, of course, is what do you do when the queue
reaches the maximum - block the next process to make a request? Better,
block all but the highest "I/O priority" process? Then, I can go copying,
moving and deleting files left, right and centre without my MP3 player
ever skipping, which is nice :-)


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

Alan Cox wrote:
> > Yes, but "how hard is it reasonable for the kernel to try" is based on
> > both items.  A good first order approximation is number of requests.
> 
> I must strongly disagree with that claim. A request could be 512 bytes or
> 128K. 

Yeah, as sct pointed out this gets thorny.  For a modern harddrive this
probably doesn't matter (since sequential I/O is SO fast compared to
seek times) but for other devices its an issue.

> > ...where the user sets a number exlpicitly for what performance they
> > want.  Again, if we're going to make the user set this latency
> 
> No they do not. The parameters are defined by the bandwidth and measured 
> behaviour.

Hmmm... well if someone wants to propose an algorithm to self-tune the
"queue depth in milliseconds" number then maybe we'll get somewhere.
You'd need to do some sort of moving averages of both requests/sec and
sectors/sec that come out of the elevator and use those as feedback to
adjust the queue-depth-value.  I'm not sure if this is advisable, but
at least it sounds feasable.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Chris Evans


On Wed, 13 Sep 2000, Matthew Kirkwood wrote:

> [0] Insert obligatory "why oh why" on Linus' not
> taking SCT's sar/iostat patches.

Indeed. It's a shame because 2.4 is gaining a relatively complete
"Enterprise checklist". It's a further shame because it's a non-intrusive
patch.

Yes, detailled I/O usage figures _are_ an Enterprise requirement. And no,
we don't have good enough statistics yet. A decent DBA will _require_
figures like % utilisation, average request service latency, average queue
depth, etc.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Alan Cox

> Yes, but "how hard is it reasonable for the kernel to try" is based on
> both items.  A good first order approximation is number of requests.

I must strongly disagree with that claim. A request could be 512 bytes or
128K. 

> It's still a queue - the queue of things we're going to take on this
> elevator swipe, right?  And the problem is one of keeping a sane
> watermark on this queue - not too many requests to destroy latency
> but enough to let the elevator do some good.

Yes. OK I agree there. If you want an efficiency bound then you need to consider
that.

> > Im talking about flow control/traffic shaping
> 
> ...where the user sets a number exlpicitly for what performance they
> want.  Again, if we're going to make the user set this latency

No they do not. The parameters are defined by the bandwidth and measured 
behaviour.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Matthew Kirkwood

On Tue, 12 Sep 2000, Rik van Riel wrote:

[ outrageous cc: list trimmed ]

> > >We simply keep track of how old the oldest request
> > >in the queue is, and when that request is getting
> > >too old (say 1/2 second), we /stop/ all the others
> > 
> > Going in function of time is obviously wrong. A blockdevice can
> > write 1 request every two seconds or 1 request every msecond.
> > You can't assume anything in function of time _unless_ you have
> > per harddisk timing informations into the kernel.
> 
> Uhmmm, isn't the elevator about request /latency/ ?
> 
> And if so, in what unit do you want to measure
> latency if it isn't time?

Number of requests which get serviced between the
insertion and completion of the request.  Maybe
scaled by the average i/o queue length[0].

Matthew.

[0] Insert obligatory "why oh why" on Linus' not
taking SCT's sar/iostat patches.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Alan Cox

 Yes, but "how hard is it reasonable for the kernel to try" is based on
 both items.  A good first order approximation is number of requests.

I must strongly disagree with that claim. A request could be 512 bytes or
128K. 

 It's still a queue - the queue of things we're going to take on this
 elevator swipe, right?  And the problem is one of keeping a sane
 watermark on this queue - not too many requests to destroy latency
 but enough to let the elevator do some good.

Yes. OK I agree there. If you want an efficiency bound then you need to consider
that.

  Im talking about flow control/traffic shaping
 
 ...where the user sets a number exlpicitly for what performance they
 want.  Again, if we're going to make the user set this latency

No they do not. The parameters are defined by the bandwidth and measured 
behaviour.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:

 Alan Cox wrote:
   Yes, but "how hard is it reasonable for the kernel to try" is based on
   both items.  A good first order approximation is number of requests.
  
  I must strongly disagree with that claim. A request could be 512 bytes or
  128K. 
 
 Yeah, as sct pointed out this gets thorny.  For a modern harddrive this
 probably doesn't matter (since sequential I/O is SO fast compared to
 seek times) but for other devices its an issue.

How about a simple "cost metric" for each request? (Each individual
request gets X "points", plus Y per block. Make Y almost zero for HDDs and
tape drives etc., make X and Y equal for solid state storage, say.)

In terms of latency, I'd suggest we aim to keep the device in use all the
time we have outstanding requests: every time the device is ready to
accept a request, we feed it the "next" one in the queue; until it is free
again, requests pile up in the queue, being sorted, merged etc.

This should perform very well under light loads - requests just get passed
to the device as fast as they can be handled - and under very heavy loads
- we have a large queue in play, with plenty of scope for reordering,
merges etc.

   ...where the user sets a number exlpicitly for what performance they
   want.  Again, if we're going to make the user set this latency
  
  No they do not. The parameters are defined by the bandwidth and measured 
  behaviour.
 
 Hmmm... well if someone wants to propose an algorithm to self-tune the
 "queue depth in milliseconds" number then maybe we'll get somewhere.
 You'd need to do some sort of moving averages of both requests/sec and
 sectors/sec that come out of the elevator and use those as feedback to
 adjust the queue-depth-value.  I'm not sure if this is advisable, but
 at least it sounds feasable.

With the metrics above, you should be able to calculate appropriate values
for X and Y to make the "cost" figures roughly correspond to actual times,
I think? The big question, of course, is what do you do when the queue
reaches the maximum - block the next process to make a request? Better,
block all but the highest "I/O priority" process? Then, I can go copying,
moving and deleting files left, right and centre without my MP3 player
ever skipping, which is nice :-)


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Mitchell Blank Jr

James Sutherland wrote:
 In terms of latency, I'd suggest we aim to keep the device in use all the
 time we have outstanding requests: every time the device is ready to
 accept a request, we feed it the "next" one in the queue; until it is free
 again, requests pile up in the queue, being sorted, merged etc.
 
 This should perform very well under light loads - requests just get passed
 to the device as fast as they can be handled - and under very heavy loads
 - we have a large queue in play, with plenty of scope for reordering,
 merges etc.

The "large queue" goes against the whole point of this exercise - that
is that if there are many items in the "queue" being sorted then
unlucky requests can end up waiting a long time to get serviced.
You want to have some reasonable break-off point where you just
decide to make an elevator swipe in order to avoid starving those
requests.

Keep in mind that I'm using the word "queue" in a confusing manner
here - we're talking about how much we'll try to elevator-sort in
a go, not the actual queue of all requests.  I've been calling this
a queue because from a performance standpoint that's what it is,
so I think it helps to think of the problem that way.

Oh yeah one other thing about huge elevator queues to think about
(although its not too pertinant these days).  A friend of mine had
a job 15 years or so ago to working on a commercial UNIX flavor.
They were having some severe I/O performance problems under high
loads.  The problem was that they were trying to sort and merge
all the pending requests.  When I/O was heavy this ended up
chewing all the CPU time and the kernel wouldn't actually be
able to keep up sending requests to the disk :-)  However, keep
in mind that this was int the bad-old-days when "hard drive"
meant a rack-mount Eagle and the elevator was expected to consider
things like head positioning and rotational speed when doing
a sort...

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Andrea Arcangeli

On Wed, 13 Sep 2000, Alan Cox wrote:

 Yes, but "how hard is it reasonable for the kernel to try" is based on
 both items.  A good first order approximation is number of requests.

I must strongly disagree with that claim. A request could be 512 bytes or
128K. 

Current elevator account the number of bh that enters the bh layer (we're 
not making difference between 1k and 4k softblocksizes though).

I was misleading (sorry) when I used the term "request", I didn't meant a
`struct request" but a new I/O request (1k/4k sized) pushed into the
blkdev layer.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Andrea Arcangeli

On Wed, 13 Sep 2000, Martin Dalecki wrote:

Andrea Arcangeli wrote:
 
 On Tue, 12 Sep 2000, Martin Dalecki wrote:
 
 First of all: In the case of the mp3 player and such there is already a
 fine
 proper way to give it better chances on getting it's job done smooth -
 RT kernel sceduler priorities and proper IO buffering. I did something
 similiar
 to a GDI printer driver...
 
 Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
 than 128mbyte :) and then run in background `cp /dev/zero .` in the same
 fs where your mp3 file out of cache is living. Then you'll see why a large
 buffer is useless if there's none kind of I/O fair scheduling into the
 elevator. Repeat the same test in 2.2.16 then.
 
 The I/O latency Hans was taking about for the mp3 player, is the time it
 takes for the buffer to become empty.

I was talking about *proper* buffering not necessary *big* buffers.

Specificate "*proper* buffering".

If your buffer is shorter than the size of the file (and as it's a buffer
it's intended to be shorter as your mp3 file can be several gigabytes
long), then at some point you'll have to page in from disk the new data to
refill the buffer.

What your proper buffering will do when linux doesn't complete the read
I/O to refill the buffer in time? (in time == before the buffer become
empty)

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Hans Reiser

"Jeff V. Merkey" wrote:
 
 One important point on remirroring I did not mention in my post.  In
 NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
 artificial starvation while remirring is going on.  This was another
 optimization we learned the hard way by trying numerous approaches to
 the problem.
 
 Jeff

Interesting.  Thanks Jeff.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:

 James Sutherland wrote:
  In terms of latency, I'd suggest we aim to keep the device in use all the
  time we have outstanding requests: every time the device is ready to
  accept a request, we feed it the "next" one in the queue; until it is free
  again, requests pile up in the queue, being sorted, merged etc.
  
  This should perform very well under light loads - requests just get passed
  to the device as fast as they can be handled - and under very heavy loads
  - we have a large queue in play, with plenty of scope for reordering,
  merges etc.
 
 The "large queue" goes against the whole point of this exercise - that
 is that if there are many items in the "queue" being sorted then
 unlucky requests can end up waiting a long time to get serviced.

Yes, you need some sort of priority boost to avoid starving unlucky
processes. Tracking a sort of I/O goodness value may help here: if I fire
off a copy of a large file, and a "find /", the former will get much
higher throughput due to large consecutive reads. As it does so, though,
it reduces its own I/O priority; 'eventually' it will be below find, and
find gets some I/O bandwidth.

 You want to have some reasonable break-off point where you just
 decide to make an elevator swipe in order to avoid starving those
 requests.

Hrmm... not quite the system I had in mind. I specifically want to avoid
delaying any requests: when possible, we just pass requests straight to
the drive. The queue of outstanding requests only arises in the first
place when the drive is busy.

 Keep in mind that I'm using the word "queue" in a confusing manner
 here - we're talking about how much we'll try to elevator-sort in
 a go, not the actual queue of all requests.  I've been calling this
 a queue because from a performance standpoint that's what it is,
 so I think it helps to think of the problem that way.

Ahh... I see. If we have a large enough backlog of requests that we can't
even elevator sort them fast enough, there's something very odd going
on. Still, I was planning to impose a maximum limit on the backlog - the
question is, what to do when that limit is reached??

 Oh yeah one other thing about huge elevator queues to think about
 (although its not too pertinant these days).  A friend of mine had
 a job 15 years or so ago to working on a commercial UNIX flavor.
 They were having some severe I/O performance problems under high
 loads.  The problem was that they were trying to sort and merge
 all the pending requests.  When I/O was heavy this ended up
 chewing all the CPU time and the kernel wouldn't actually be
 able to keep up sending requests to the disk :-)  However, keep
 in mind that this was int the bad-old-days when "hard drive"
 meant a rack-mount Eagle and the elevator was expected to consider
 things like head positioning and rotational speed when doing
 a sort...

LOL! My approach does avoid that, though. As a worst case, the elevator
just doesn't get a chance to sort anything: every request issues hits the
disk directly. If the disk is handling requests fast enough the CPU can't
keep up, we don't really want an elevator anyway...


So, in various scenarios:

* Light I/O - disk handling requests as they are issued. Disk may be idle
at times.

* Medium I/O - small backlog being elevator sorted as needed. Disk active
continuously, but all requests being handled OK.

* Heavy I/O - large backlog building up. Disk continuously active, some
processes may be starved or throttled, depending on priorities.


In the third case, sometimes we WILL want processes starved of I/O. If I
have three processes running - an MP3 player (high priority), a 'find' and
a low priority backup job, I DO want the backup job starved in favour of
the other two.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread James Sutherland

On Wed, 13 Sep 2000, Andrea Arcangeli wrote:

 On Wed, 13 Sep 2000, Mitchell Blank Jr wrote:
 
 The "large queue" goes against the whole point of this exercise - that
 is that if there are many items in the "queue" being sorted then
 unlucky requests can end up waiting a long time to get serviced.
 
 Yep.

Ehmm... that large queue is a backlog of I/O requests. Since that's a
function of the workload and the disk speed, you can't reduce it without
either cutting the workload or speeding up the disk!

I suspect we were talking at cross purposes here a little. I was NOT
suggesting holding back requests to build up a big queue - I was talking
about the situation where we have a large backlog of requests pending.


James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Michael T. Babcock

- Original Message -
From: "Rik van Riel" [EMAIL PROTECTED]


 On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
  On Tue, 12 Sep 2000, Rik van Riel wrote:
 
  Uhmmm, isn't the elevator about request /latency/ ?
 
  Yes, but definitely not absolute "time" latency.

 So what do you think about the idea Jeff Merkey
 presented? It seems to work well for Netware, so
 maybe we should try for Linux?

If I may ask a potentially stupid question, how can request latency be
anything but a factor of time?  Latency is how /long/ you (or the computer)
/waits/ for something.  That defines it as a function of time.

Just curious what the full comment being made is ...
... because I'm suspicious that the terms are being confused and there is
some discussion about reducing drive access latency within the context of
not moving the heads too much, etc. without the context of the user-space
application waiting for data.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Rogier Wolff

 "Jeff V. Merkey" wrote:
  
  One important point on remirroring I did not mention in my post.  In
  NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
  artificial starvation while remirring is going on.  This was another
  optimization we learned the hard way by trying numerous approaches to
  the problem.

Don't you want to do it a megabyte at a time to prevent abyssimal
disk-performance? (i.e. take the separate megabytes backwards, but do
every megabyte forwards)

instead of 
for (b=max;b0;b--) 
do 
   for (bb=max-STEP;bb  0 ; bb -= STEP)
for (b=bb;bbb+STEP;b++)
(sloppy coding! this is pseudocde, don't copy-paste.)

I expect about 120 IO operations of say 4k (480k per sec) out of a
disk if you read strictly backwards. While you can get about 18 IOs of
1Mb per second (18M per second) out of a disk that you run a
(forwards) megabyte at a time.

The extra latency in the "elevator" is then the time to read a
megabyte, about 50ms, which sounds acceptable to me.


Roger. 

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
*   Common sense is the collection of*
**  prejudices acquired by age eighteen.   -- Albert Einstein 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2 (summary of elevator ideas)

2000-09-13 Thread Jeff V. Merkey


You're welcome.

:-)

Jeff

Hans Reiser wrote:
 
 "Jeff V. Merkey" wrote:
 
  One important point on remirroring I did not mention in my post.  In
  NetWare, remirroring scans the disk BACKWARDS (n0) to prevent
  artificial starvation while remirring is going on.  This was another
  optimization we learned the hard way by trying numerous approaches to
  the problem.
 
  Jeff
 
 Interesting.  Thanks Jeff.
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
 If I may ask a potentially stupid question, how can request latency be
 anything but a factor of time?  Latency is how /long/ you (or the computer)
 /waits/ for something.  That defines it as a function of time.

Latency is of course a factor of time, but the point is that the
acceptable latency differs from device to device. For a slower device
longer latency must be acceptable, and if the relationship is linear,
then using number of requests may be a simpler and better way of doing
it.


Another potentially stupid question:
When the queue gets too long/old, new requests should be put in a new
queue to avoid starvation for the ones in the current queue, right?
So if this is done by time, how do you know when the oldest request get
too old? You would need to index the requests both by sector and time,
and thus performance overhead, right?
If you, however have a simple rule that max 100 requests should be put
in each queue, it's easy to know when to start a new one. The number 100
should be found by calculating how many requests can be served within
the acceptable latency.


-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Rik van Riel

On Wed, 13 Sep 2000, Ragnar Kjørstad wrote:
 On Wed, Sep 13, 2000 at 11:22:16AM -0400, Michael T. Babcock wrote:
  If I may ask a potentially stupid question, how can request latency be
  anything but a factor of time?  Latency is how /long/ you (or the computer)
  /waits/ for something.  That defines it as a function of time.
 
 Latency is of course a factor of time, but the point is that the
 acceptable latency differs from device to device. For a slower device
 longer latency must be acceptable, and if the relationship is linear,
 then using number of requests may be a simpler and better way of doing
 it.
 
 Another potentially stupid question:
 When the queue gets too long/old, new requests should be put in
 a new queue to avoid starvation for the ones in the current
 queue, right?

Indeed. That would solve the problem...

 So if this is done by time, how do you know when the oldest
 request get too old? You would need to index the requests both
 by sector and time, and thus performance overhead, right?

 If you, however have a simple rule that max 100 requests should
 be put in each queue, it's easy to know when to start a new one.

The idea Jeff Merkey gave us is even simpler and should work
ok every time. I think I'll implement it RSN and try if it
works ok for Linux.

(maybe with the twist that we /do/ allow merges on the queue
that's being processed by the disk, but no insertions of new
requests)

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 08:08:51PM -0400, Giuliano Pochini wrote:
  And if so, in what unit do you want to measure
  latency if it isn't time?
 
 I had a look at the new elevator. I hope I'm not wrong here... Requests
 are added to the queue scanning it starting from the tail. When the
 new request if found in-order with ad existing one, it's inserted int
 the queue, unless it finds an existing request out of latency units. In
 that case the new rq is added before it (so it will be serviced after).

If I understand this correctly, that means once the queue is too long
(latency wise) the elevator will stop working completely, because all
requests will be served in the order they come in. right?

If multiple queues are used, so performance is OK even when the first
queue is full (too old), the discussion about how to tell if the queue
is too old or not (by time or requests) becomes less important. After
all, there is no reason not to start a new queue pretty often, even on a
fast device.

The correct behavoir would be to reorderder the new requests
individually even if the queue is alreaddy too long. In other words -
multiple queues.

 not work this way. The main problem is that it doesn't minimize head
 movement. For example, when comes a request for a sector at the
 beginning of the disk it goes to the beginning of the queue (it's
 ordered) so it will be serviced before other rqs (it picks up rqs from
 the head). This way the higher is the block number, more likely that
 rq will wait a lot.

If a new request is made for a sector before the current sector (the
sector of the last request to be served), the new request should be
added to the second queue even if the first one is not too old.



-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Rik van Riel

On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
 On Wed, Sep 13, 2000 at 06:57:21PM -0300, Rik van Riel wrote:
   Another potentially stupid question:
   When the queue gets too long/old, new requests should be put in
   a new queue to avoid starvation for the ones in the current
   queue, right?
  
  Indeed. That would solve the problem...
 
 I think something like this: (I don't know the current code, so maybe
 this suggestion is really stupid.)
 
 struct request_queue {
   int4start_time;
   struct request *list;
   struct request_queue *next;
 }
 
 struct request_queue *current_queue;
 
 function too_old (struct request_queue *q) {
   /* can actually easily be implemented using time, nr of requests
  or any other messure of amount of work */
   if (current_time  q-start_time + MAXTIME)
   return 1;
   else
   return 0;
 }
 
 function insert_request(struct request *new) {
   struct request_queue *q=current_queue;
   if (new-sectornr  current_sector);
   q=q-next;
   while (too_old(q))
   q=q-next;
   insert_request(q-list, new);
 }

This (IMHO) is wrong. When the drive has trouble keeping up
with requests in FCFS order, we should do some elevator sorting
to keep throughput high enough to deal with the requests we get.

Jeff's idea would take care of this, together with a *limited*
(say, 32 entries) queue size. Every time the "work" queue (the
one the disk is working on) is empty, we switch queues.

Processes get to put their request on the other queue, either
until the disk takes their requests (the queues are switched)
or until the queue is full. That way only the N highest priority
processes can get access to the disk if we're really overloaded
in a bad way.

The current HUGE queue size is probably another reason for
the very bad latencies we sometimes see...

  (maybe with the twist that we /do/ allow merges on the queue
  that's being processed by the disk, but no insertions of new
  requests)
 
 I don't think you should allow merges. If one process is doing a big 
 IO operation on a big file it would still get _all_ capacity, right?

Only if you were to allow unlimited merges ;)

I guess a 'decent' maximum size for one request would
be somewhere around 256kB, since this is the amount of
data a drive can read in the time of one disk seek...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-13 Thread Ragnar Kjørstad

On Wed, Sep 13, 2000 at 08:08:12PM -0300, Rik van Riel wrote:
 On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
  On Wed, Sep 13, 2000 at 06:57:21PM -0300, Rik van Riel wrote:
Another potentially stupid question:
When the queue gets too long/old, new requests should be put in
a new queue to avoid starvation for the ones in the current
queue, right?
   
   Indeed. That would solve the problem...
  
  I think something like this: (I don't know the current code, so maybe
  this suggestion is really stupid.)
  
  struct request_queue {
  int4start_time;
  struct request *list;
  struct request_queue *next;
  }
  
  struct request_queue *current_queue;
  
  function too_old (struct request_queue *q) {
  /* can actually easily be implemented using time, nr of requests
 or any other messure of amount of work */
  if (current_time  q-start_time + MAXTIME)
  return 1;
  else
  return 0;
  }
  
  function insert_request(struct request *new) {
  struct request_queue *q=current_queue;
  if (new-sectornr  current_sector);
  q=q-next;
  while (too_old(q))
  q=q-next;
  insert_request(q-list, new);
  }
 
 This (IMHO) is wrong. When the drive has trouble keeping up
 with requests in FCFS order, we should do some elevator sorting
 to keep throughput high enough to deal with the requests we get.

The "list" in each queue should be sorted. So, even if the queue is
too old, the requests will still be sorted, but only within each queue.

 Jeff's idea would take care of this, together with a *limited*
 (say, 32 entries) queue size. Every time the "work" queue (the
 one the disk is working on) is empty, we switch queues.

OK, I called the "work" queue my "current" queue in the suggestion
above, but other than that it may not be so different.

 Processes get to put their request on the other queue, either
 until the disk takes their requests (the queues are switched)
 or until the queue is full. That way only the N highest priority
 processes can get access to the disk if we're really overloaded
 in a bad way.

I'm not sure I understand this.

New requests should always be put in the first queue that has room (is
not too old). This means that no requests will be shifted too far off
it's original order. (in other words, it is not too unfair)

  I don't think you should allow merges. If one process is doing a big 
  IO operation on a big file it would still get _all_ capacity, right?
 
 Only if you were to allow unlimited merges ;)
 
 I guess a 'decent' maximum size for one request would
 be somewhere around 256kB, since this is the amount of
 data a drive can read in the time of one disk seek...

Yes, this sounds like a good idea. (Don't know about the size though,
but testing will show)


-- 
Ragnar Kjørstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
> > time, but remember that there are two things measured in time here:
> >   A. The time for the whole queue of requests to run (this is what Rik is
> >  proposing using to throttle)
> >   B. The time an average request takes to process.
> 
> Your perceived latency is based entirely on A.

Yes, but "how hard is it reasonable for the kernel to try" is based on
both items.  A good first order approximation is number of requests.

> > If we limit on the depth of queue we're (to some level of approximation)
> > making our decision based on A/B.  It's still a magic constant, but at
> 
> I dont suggest you do queue limiting on that basis. I suggest you do order
> limiting based on time slots

It's still a queue - the queue of things we're going to take on this
elevator swipe, right?  And the problem is one of keeping a sane
watermark on this queue - not too many requests to destroy latency
but enough to let the elevator do some good.

> > Well, actually just about any communications protocol worth its salt
> > uses some sort of windowing throttle based on the amount of data
> 
> Im talking about flow control/traffic shaping

...where the user sets a number exlpicitly for what performance they
want.  Again, if we're going to make the user set this latency
variable for each of their devices, then doing it based on time will
work great.

> > There's too many orders of magnatude difference between even just SCSI
> > disks (10 yr old drive?  16-way RAID?  Solid state?) to make
> > supplying any sort of default with the kernel impractical.  The end
> 
> The same argument is equally valid for the current scheme, and I think you'll
> find equally bogus

There will always need to be tunables - and it's fine to say "if you've
got oddball hardware and/or workload and/or requirements then you should
twiddle this knob".  But it seems to me that the current scheme works
well for a pretty big range of devices.  If you do the setting based
on time, I think it'll be a lot more sensitive since there's nothing
that will scale based on the speed of the device.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Hans Reiser wrote:
> 
> I really think Rik has it right here.  In particular, an MP3 player needs to be able 
>to say, I have
> X milliseconds of buffer so make my worst case latency X milliseconds.  The number 
>of requests is
> the wrong metric, because the time required per request depends on disk geometry, 
>disk caching, etc.
> 

No the problem is that an application should either: 

1. Take full controll of the underlying system.
2. Don't care about selftuning the OS.

Becouse that's what operating systems are for in first place: Letting
the
applications run without care of the underlying hardware.

Linux is just mistaken by desing that there should be a generic elevator
for any block device sitting on a single queue for any kind of attached
device. Only device drivers know best how to handle queueing and stuff
like
this. The upper layers should only car about semanticall correctness of
the
request orders not about optimization of them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

> time, but remember that there are two things measured in time here:
>   A. The time for the whole queue of requests to run (this is what Rik is
>  proposing using to throttle)
>   B. The time an average request takes to process.

Your perceived latency is based entirely on A.

> If we limit on the depth of queue we're (to some level of approximation)
> making our decision based on A/B.  It's still a magic constant, but at

I dont suggest you do queue limiting on that basis. I suggest you do order
limiting based on time slots

> Well, actually just about any communications protocol worth its salt
> uses some sort of windowing throttle based on the amount of data

Im talking about flow control/traffic shaping

> If we move to a "length of queue in time" as Rik suggests then we're
> going to have to MAKE the user set it manually for each device.

No

> There's too many orders of magnatude difference between even just SCSI
> disks (10 yr old drive?  16-way RAID?  Solid state?) to make
> supplying any sort of default with the kernel impractical.  The end

The same argument is equally valid for the current scheme, and I think you'll
find equally bogus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
> > Now, I see people trying to introduce the concept of elapsed time into
> > that fix, which smells strongly of hack. How will this hack be cobbled
> 
> Actually my brain says that elapsed time based scheduling is the right thing
> to do.

No, Andrea is right here.  The argument that everyone is using ("Our target -
latency - is measured in time") is utterly bogus.  Yes, it's measured in
time, but remember that there are two things measured in time here:
  A. The time for the whole queue of requests to run (this is what Rik is
 proposing using to throttle)
  B. The time an average request takes to process.

If we limit on the depth of queue we're (to some level of approximation)
making our decision based on A/B.  It's still a magic constant, but at
least it's scaled to take into account the speed of the drive.  And
underneath, it's still based on time.

> It certainly works for networks

Well, actually just about any communications protocol worth its salt
uses some sort of windowing throttle based on the amount of data
outstanding, not the length of time it's been in the queue.  Which
is why TCP works well over both GigE and 28.8. [*]  Now substitute
"big fiberchannel RAID" for GigE and "360K floppy" for 28.8 and
you've got the same problem.

*  -- Yes, for optimal TCP over big WAN pipes you may want to use a
  larger buffer size, but that's a matter of the bandwidth
  delay product, which isn't relavent for talking about storage

If we move to a "length of queue in time" as Rik suggests then we're
going to have to MAKE the user set it manually for each device.
There's too many orders of magnatude difference between even just SCSI
disks (10 yr old drive?  16-way RAID?  Solid state?) to make
supplying any sort of default with the kernel impractical.  The end
result might be a bit better behaved, but only just slightly.
If people absolutely need this behavior for some reason, the current
algorithm should stay as the default.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Andrea Arcangeli wrote:
> 
> On Tue, 12 Sep 2000, Martin Dalecki wrote:
> 
> >First of all: In the case of the mp3 player and such there is already a
> >fine
> >proper way to give it better chances on getting it's job done smooth -
> >RT kernel sceduler priorities and proper IO buffering. I did something
> >similiar
> >to a GDI printer driver...
> 
> Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
> than 128mbyte :) and then run in background `cp /dev/zero .` in the same
> fs where your mp3 file out of cache is living. Then you'll see why a large
> buffer is useless if there's none kind of I/O fair scheduling into the
> elevator. Repeat the same test in 2.2.16 then.
> 
> The I/O latency Hans was taking about for the mp3 player, is the time it
> takes for the buffer to become empty.

I was talking about *proper* buffering not necessary *big* buffers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Martin Dalecki wrote:

>First of all: In the case of the mp3 player and such there is already a
>fine
>proper way to give it better chances on getting it's job done smooth - 
>RT kernel sceduler priorities and proper IO buffering. I did something
>similiar
>to a GDI printer driver...

Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
than 128mbyte :) and then run in background `cp /dev/zero .` in the same
fs where your mp3 file out of cache is living. Then you'll see why a large
buffer is useless if there's none kind of I/O fair scheduling into the
elevator. Repeat the same test in 2.2.16 then.

The I/O latency Hans was taking about for the mp3 player, is the time it
takes for the buffer to become empty.

>device driver level, if at all. In fact you have already bad 
>interactions between strategies of low level drivers and the high
>level code in Linux - like for example the "get from top of queue" or 
>"don't get it from top of the IO queue" mess between
>IDE and SCSI middlelayers... (However this got a bit better recently.)

That's historic cruft, it's unrelated to controlling the elevator
algorithm per-task/per-file basis IMHO.

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

> Why do you say it's not been fixed? Can you still reproduce hangs long as
> a write(2) can write? I certainly can't.

I cant reproduce long hangs. Im not seeing as good I/O throughput as before
but right now Im quite happy with the tradeoff. If someone can make it better
then Im happier still

> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Rik van Riel

On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> On Tue, 12 Sep 2000, Rik van Riel wrote:
> 
> >Also, this possibility is /extremely/ remote, if not
> >impossible. Well, it could happen at one point in time,
> 
> It's not impossible. Think when you run a backup of you home
> directory while you're listening mp3. Both `tar` and `xmms` will
> read the same file that is out of cache.
> 
> `tar` will be the first one who will read the next out-of-cache
> data-page of the file. The I/O will be so issued with low prio
> but then, as soon as `tar` has issued the read I/O, also `xmms`
> will wait on the same page and it will skip the next deadline
> because the I/O is been issued with low prio.

Indeed, this could be an issue...

> To make it work right is not simple.

I don't know if we really have to care about this
case. The process queueing the IO is more than
likely a good guess, and a good guess is (IMHO)
better than not guessing at all and hoping things
will be ok.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Rik van Riel

On Tue, 12 Sep 2000, Martin Dalecki wrote:

> Second: The concept of time can give you very very nasty
> behaviour in even cases. [integer arithmetic]

Point taken.

> Third: All you try to improve is the boundary case between an
> entierly overloaded system and a system which has a huge reserve
> to get the task done. I don't think you can find any
> "improvement" which will not just improve some cases and hurt
> some only slightly different cases badly. That's basically the
> same problem as with the paging strategy to follow. (However we
> have some kind of "common sense" in respect of this, despite the
> fact that linux does ignore it...)

Please don't ignore my VM work  ;)
http://www.surriel.com/patches/

> Firth: The most common solution for such boundary cases is some
> notion of cost optimization, like the nice value of a process or
> page age for example, or alternative some kind of choice between
> entierly different strategies (remember the term strategy
> routine) - all of them are just *relative* measures not
> absolute time constrains.

Indeed, we'll need to work with relative measures to
make sure both throughput and latency are OK. Some kind
of (very simple) self-tuning system is probably best here.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Chris Evans wrote:

>the elevator code. Keep it to a queue management system, and suddenly it
>scales to slow or fast devices without any gross device-type specific
>tuning.

Yep, that was the object.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Rik van Riel wrote:

>But you don't. Transfer rate is very much dependant on the
>kind of load you're putting on the disk...

Transfer rate means `hdparm -t` in single user mode. Try it and you'll see
you'll get always the same result.

>Throughput really isn't that relevant here. The problems are

Thoughput is relevant. Again, how do you get a 1msec latency out of a
blockdevice that writes 1 request every two seconds?

>With equally horrible results for most machines I've
>seen. For a while I actually thought the bug /must/
>have been somewhere else because I saw processes
>'hanging' for about 10 minutes before making progress
>again ...

As said in my earlier email the current 2.4.x elevator scheduler is
_disabled_. I repeat: you should change include/linux/elevator.h and set
the read and write latency to 250 and 500 respectively. You won't get
latency as good as in test1, but it won't hang for 10 minutes.

I and Jens sent a patch to Linus to reenable that push fixing some other
problem of the blkdev merging reported by Giuliano Pochini, plus wake-one
flow control, plus I fixed the DMA_CHUNK_SIZE thing that was buggy and
it's still buggy in the sparc64 case (I didn't fixed sparc64 in my patch,
the DMA_CHUNK_SIZE should limit the segments to
max(64,SHpnt->sg_tablesize) and not to 64 as now), plus it allowed 512K
sized SCSI commands, plus some other minor thing but it didn't get merged.

>Not really. What about just using something like
>"half a second, but at least 10 requests liberty to
>reorder" ?

I doesn't make sense.

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Rik van Riel wrote:

>Why do you always come up with impossible examples?

That's not impossible. impossible != unlikely.

A more regular case is when you have an extremely fast device, were a 1/2
second latency is too much, using 100msec could be just enough to provide
good tiotest numbers and you would get better latency with the current
elevator than with the 1/2 second hardwired value.

>What I'd like to see is an elevator where the settings set
>by the user have a direct influence in the behaviour observed.

I can see direct influence. (more in the past but also now)

>Doing time-based request sorting should give us that behaviour
>and a default of say 1/2 second would work fine for all the
>hard disks and cdrom drives I've seen in the last 6 years.

Well changing that is very easy, you only have to change the unit of
measure w/o changing one bit in the algorithm that is just implemented
indeed.

Just assume the req->elevator_sequence to be calculate in jiffy and in
elevator.c change the check for `!req->elevator_sequence' to
`time_before(req->elevator_sequence, jiffies)'. Then of course change the
initialization of req->elevator_sequence to be done with `jiffies +
elevator->read_latency'. Then also elvtune will talk in jiffies and not in
requests.

>Of course you can dream up an imaginary device where it won't
>work, but in that case there's always the possibility of tuning
>the elevator (like we do right now) ...

I don't like having to tune things by hand for the unlikely case. Note
that also the unit of bytes thing that we have now may have to be tuned
for some case, but it's even less unlikely that you have to do that to get
good throughput.

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

> That problem: the original elevator code did not schedule I/O particularly
> fairly under certain I/O usage patterns. So it got fixed.

No it got hacked up a bit.

> Now, I see people trying to introduce the concept of elapsed time into
> that fix, which smells strongly of hack. How will this hack be cobbled

Actually my brain says that elapsed time based scheduling is the right thing
to do. It certainly works for networks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Alan Cox wrote:

> > Now, I see people trying to introduce the concept of elapsed time into
> > that fix, which smells strongly of hack. How will this hack be cobbled
> 
> Actually my brain says that elapsed time based scheduling is the right
> thing to do. It certainly works for networks

Interesting, I'll try and run with this. The mention of networks reminds
me that any "max service time" variable is a tunable quantity depending on
current conditions..

.. and sct's block device I/O accounting patches give us the current
average request service time on a per-device basis. Multiply that up a bit
and maybe you have your threshold for moving things to the head of the
queue.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Chris Evans wrote:
> 
> On Tue, 12 Sep 2000, Hans Reiser wrote:
> 
> > I really think Rik has it right here.  In particular, an MP3 player
> > needs to be able to say, I have X milliseconds of buffer so make my
> > worst case latency X milliseconds.  The number of requests is the
> > wrong metric, because the time required per request depends on disk
> > geometry, disk caching, etc.
> 
> Hi,
> 
> We need to separate "what's a good idea" from "what's the best way to do
> it".
> 
> Sure, it's a good idea to get an mp3 player's I/O request serviced within
> a certain amount of time. Is the best way to do that hacking the concept
> of time into the elevator algorithm? That's currently under debate.
> 
> In fact, the guarantee of I/O service time for a single process (mp3
> player) is pretty orthogonal to the per-device elevator settings. If you
> have a certain block device set to a max latency of 0.25s, and lots of
> processes are hammering the disk, then something will have to give,
> i.e. under heavy load this setting will be useless and not honoured.
> 
> The solution to this particular mp3 player scenario would be something
> like the task scheduler policies we have. For example, maybe we could flag
> a given process so that all it's I/O requests go to the head of the queue.
> 
> Cheers
> Chris

First of all: In the case of the mp3 player and such there is already a
fine
proper way to give it better chances on getting it's job done smooth - 
RT kernel sceduler priorities and proper IO buffering. I did something
similiar
to a GDI printer driver...

Second: The concept of time can give you very very nasty behaviour in
even
cases. Assume that a disc can only do 1 request per second. And then
imagin
a sceduling based on a 1+epsilon second... basically the disc will be
run
with half the speed it could. Those nasty integer arithmetics can you
catch easly
and mostly allways entierly unexpecting.

Third: All you try to improve is the boundary case between an entierly
overloaded system and a system which has a huge reserve to get the task
done.
I don't think you can find any "improvement" which will not just improve
some
cases and hurt some only slightly different cases badly. That's
basically the
same problem as with the paging strategy to follow. (However we have
some
kind of "common sense" in respect of this, despite the fact that linux
does ignore
it...)

Firth: The most common solution for such boundary cases is some notion
of
cost optimization, like the nice value of a process or page age for
example, or 
alternative some kind of choice between entierly different strategies
(remember
the term strategy routine)
- all of them are just *relative* measures not absolute time constrains.

Fifth: I think that such kind of IO behaviour control isn't something
generic enough for the elevator - it should be all done on the
device driver level, if at all. In fact you have already bad 
interactions between strategies of low level drivers and the high
level code in Linux - like for example the "get from top of queue" or 
"don't get it from top of the IO queue" mess between
IDE and SCSI middlelayers... (However this got a bit better recently.)

-- 
- phone: +49 214 8656 283
- job:   STOCK-WORLD Media AG, LEV .de (MY OPPINNIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Rik van Riel

On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> On Tue, 12 Sep 2000, Rik van Riel wrote:
> 
> >As an approximation, you could take the task that queues the
> >request. Only kswapd, kflushd and kupdate would be the "false
> 
> That would probably be ok, but if we do that I prefer do it without
> guesses.
> 
> The approssimation would be wrong for example if you have two
> tasks reading the same file, one task generate the readahead but
> it doesn't read it, the other task only blocks waiting the
> readahead to complete.

Do we really care about the case that doesn't generate
disk seeks? ;)

Also, this possibility is /extremely/ remote, if not
impossible. Well, it could happen at one point in time,
but it's not something both tasks can keep up for more
than a second...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Chris Evans wrote:

>like the task scheduler policies we have. For example, maybe we could flag
>a given process so that all it's I/O requests go to the head of the queue.

Yes. We can't do that at the moment because at the elevator layer we don't
know who is waiting for I/O completion.

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Hans Reiser wrote:

> I really think Rik has it right here.  In particular, an MP3 player
> needs to be able to say, I have X milliseconds of buffer so make my
> worst case latency X milliseconds.  The number of requests is the
> wrong metric, because the time required per request depends on disk
> geometry, disk caching, etc.

Hi,

We need to separate "what's a good idea" from "what's the best way to do
it".

Sure, it's a good idea to get an mp3 player's I/O request serviced within
a certain amount of time. Is the best way to do that hacking the concept
of time into the elevator algorithm? That's currently under debate.


In fact, the guarantee of I/O service time for a single process (mp3
player) is pretty orthogonal to the per-device elevator settings. If you
have a certain block device set to a max latency of 0.25s, and lots of
processes are hammering the disk, then something will have to give,
i.e. under heavy load this setting will be useless and not honoured.

The solution to this particular mp3 player scenario would be something
like the task scheduler policies we have. For example, maybe we could flag
a given process so that all it's I/O requests go to the head of the queue.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Hans Reiser

I really think Rik has it right here.  In particular, an MP3 player needs to be able 
to say, I have
X milliseconds of buffer so make my worst case latency X milliseconds.  The number of 
requests is
the wrong metric, because the time required per request depends on disk geometry, disk 
caching, etc.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Rik van Riel wrote:

> On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
> > On Tue, 12 Sep 2000, Rik van Riel wrote:
> > 
> > >Uhmmm, isn't the elevator about request /latency/ ?
> > 
> > Yes, but definitely not absolute "time" latency.
> > 
> > How do you get a 1msec latency for a read request out of a
> > blockdevice that writes 1 request in 2 seconds? See?
> 
> Of course, if you set a rediculous latency figure you'll
> get rediculously bad performance. However, that doesn't
> say /anything/ about if the idea is a good one or not...

People,

Remember why the elevator algorithm was changed in the first place? It was
introduced to solve a very specific problem.

That problem: the original elevator code did not schedule I/O particularly
fairly under certain I/O usage patterns. So it got fixed.

Now, I see people trying to introduce the concept of elapsed time into
that fix, which smells strongly of hack. How will this hack be cobbled
into the elevator code so that it copes with block devices from fast RAID
arrays to slow floppies to network block device!

So I have to agree with Andrea that the concept of time does not belong in
the elevator code. Keep it to a queue management system, and suddenly it
scales to slow or fast devices without any gross device-type specific
tuning.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Rik van Riel wrote:

 On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
  On Tue, 12 Sep 2000, Rik van Riel wrote:
  
  Uhmmm, isn't the elevator about request /latency/ ?
  
  Yes, but definitely not absolute "time" latency.
  
  How do you get a 1msec latency for a read request out of a
  blockdevice that writes 1 request in 2 seconds? See?
 
 Of course, if you set a rediculous latency figure you'll
 get rediculously bad performance. However, that doesn't
 say /anything/ about if the idea is a good one or not...

People,

Remember why the elevator algorithm was changed in the first place? It was
introduced to solve a very specific problem.

That problem: the original elevator code did not schedule I/O particularly
fairly under certain I/O usage patterns. So it got fixed.

Now, I see people trying to introduce the concept of elapsed time into
that fix, which smells strongly of hack. How will this hack be cobbled
into the elevator code so that it copes with block devices from fast RAID
arrays to slow floppies to network block device!

So I have to agree with Andrea that the concept of time does not belong in
the elevator code. Keep it to a queue management system, and suddenly it
scales to slow or fast devices without any gross device-type specific
tuning.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Hans Reiser

I really think Rik has it right here.  In particular, an MP3 player needs to be able 
to say, I have
X milliseconds of buffer so make my worst case latency X milliseconds.  The number of 
requests is
the wrong metric, because the time required per request depends on disk geometry, disk 
caching, etc.

Hans
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Hans Reiser wrote:

 I really think Rik has it right here.  In particular, an MP3 player
 needs to be able to say, I have X milliseconds of buffer so make my
 worst case latency X milliseconds.  The number of requests is the
 wrong metric, because the time required per request depends on disk
 geometry, disk caching, etc.

Hi,

We need to separate "what's a good idea" from "what's the best way to do
it".

Sure, it's a good idea to get an mp3 player's I/O request serviced within
a certain amount of time. Is the best way to do that hacking the concept
of time into the elevator algorithm? That's currently under debate.


In fact, the guarantee of I/O service time for a single process (mp3
player) is pretty orthogonal to the per-device elevator settings. If you
have a certain block device set to a max latency of 0.25s, and lots of
processes are hammering the disk, then something will have to give,
i.e. under heavy load this setting will be useless and not honoured.

The solution to this particular mp3 player scenario would be something
like the task scheduler policies we have. For example, maybe we could flag
a given process so that all it's I/O requests go to the head of the queue.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Chris Evans wrote:
 
 On Tue, 12 Sep 2000, Hans Reiser wrote:
 
  I really think Rik has it right here.  In particular, an MP3 player
  needs to be able to say, I have X milliseconds of buffer so make my
  worst case latency X milliseconds.  The number of requests is the
  wrong metric, because the time required per request depends on disk
  geometry, disk caching, etc.
 
 Hi,
 
 We need to separate "what's a good idea" from "what's the best way to do
 it".
 
 Sure, it's a good idea to get an mp3 player's I/O request serviced within
 a certain amount of time. Is the best way to do that hacking the concept
 of time into the elevator algorithm? That's currently under debate.
 
 In fact, the guarantee of I/O service time for a single process (mp3
 player) is pretty orthogonal to the per-device elevator settings. If you
 have a certain block device set to a max latency of 0.25s, and lots of
 processes are hammering the disk, then something will have to give,
 i.e. under heavy load this setting will be useless and not honoured.
 
 The solution to this particular mp3 player scenario would be something
 like the task scheduler policies we have. For example, maybe we could flag
 a given process so that all it's I/O requests go to the head of the queue.
 
 Cheers
 Chris

First of all: In the case of the mp3 player and such there is already a
fine
proper way to give it better chances on getting it's job done smooth - 
RT kernel sceduler priorities and proper IO buffering. I did something
similiar
to a GDI printer driver...

Second: The concept of time can give you very very nasty behaviour in
even
cases. Assume that a disc can only do 1 request per second. And then
imagin
a sceduling based on a 1+epsilon second... basically the disc will be
run
with half the speed it could. Those nasty integer arithmetics can you
catch easly
and mostly allways entierly unexpecting.

Third: All you try to improve is the boundary case between an entierly
overloaded system and a system which has a huge reserve to get the task
done.
I don't think you can find any "improvement" which will not just improve
some
cases and hurt some only slightly different cases badly. That's
basically the
same problem as with the paging strategy to follow. (However we have
some
kind of "common sense" in respect of this, despite the fact that linux
does ignore
it...)

Firth: The most common solution for such boundary cases is some notion
of
cost optimization, like the nice value of a process or page age for
example, or 
alternative some kind of choice between entierly different strategies
(remember
the term strategy routine)
- all of them are just *relative* measures not absolute time constrains.

Fifth: I think that such kind of IO behaviour control isn't something
generic enough for the elevator - it should be all done on the
device driver level, if at all. In fact you have already bad 
interactions between strategies of low level drivers and the high
level code in Linux - like for example the "get from top of queue" or 
"don't get it from top of the IO queue" mess between
IDE and SCSI middlelayers... (However this got a bit better recently.)

-- 
- phone: +49 214 8656 283
- job:   STOCK-WORLD Media AG, LEV .de (MY OPPINNIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Chris Evans


On Tue, 12 Sep 2000, Alan Cox wrote:

  Now, I see people trying to introduce the concept of elapsed time into
  that fix, which smells strongly of hack. How will this hack be cobbled
 
 Actually my brain says that elapsed time based scheduling is the right
 thing to do. It certainly works for networks

Interesting, I'll try and run with this. The mention of networks reminds
me that any "max service time" variable is a tunable quantity depending on
current conditions..

.. and sct's block device I/O accounting patches give us the current
average request service time on a per-device basis. Multiply that up a bit
and maybe you have your threshold for moving things to the head of the
queue.

Cheers
Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

 That problem: the original elevator code did not schedule I/O particularly
 fairly under certain I/O usage patterns. So it got fixed.

No it got hacked up a bit.

 Now, I see people trying to introduce the concept of elapsed time into
 that fix, which smells strongly of hack. How will this hack be cobbled

Actually my brain says that elapsed time based scheduling is the right thing
to do. It certainly works for networks


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Chris Evans wrote:

the elevator code. Keep it to a queue management system, and suddenly it
scales to slow or fast devices without any gross device-type specific
tuning.

Yep, that was the object.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Rik van Riel

On Tue, 12 Sep 2000, Martin Dalecki wrote:

 Second: The concept of time can give you very very nasty
 behaviour in even cases. [integer arithmetic]

Point taken.

 Third: All you try to improve is the boundary case between an
 entierly overloaded system and a system which has a huge reserve
 to get the task done. I don't think you can find any
 "improvement" which will not just improve some cases and hurt
 some only slightly different cases badly. That's basically the
 same problem as with the paging strategy to follow. (However we
 have some kind of "common sense" in respect of this, despite the
 fact that linux does ignore it...)

Please don't ignore my VM work  ;)
http://www.surriel.com/patches/

 Firth: The most common solution for such boundary cases is some
 notion of cost optimization, like the nice value of a process or
 page age for example, or alternative some kind of choice between
 entierly different strategies (remember the term strategy
 routine) - all of them are just *relative* measures not
 absolute time constrains.

Indeed, we'll need to work with relative measures to
make sure both throughput and latency are OK. Some kind
of (very simple) self-tuning system is probably best here.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Rik van Riel

On Tue, 12 Sep 2000, Andrea Arcangeli wrote:
 On Tue, 12 Sep 2000, Rik van Riel wrote:
 
 Also, this possibility is /extremely/ remote, if not
 impossible. Well, it could happen at one point in time,
 
 It's not impossible. Think when you run a backup of you home
 directory while you're listening mp3. Both `tar` and `xmms` will
 read the same file that is out of cache.
 
 `tar` will be the first one who will read the next out-of-cache
 data-page of the file. The I/O will be so issued with low prio
 but then, as soon as `tar` has issued the read I/O, also `xmms`
 will wait on the same page and it will skip the next deadline
 because the I/O is been issued with low prio.

Indeed, this could be an issue...

 To make it work right is not simple.

I don't know if we really have to care about this
case. The process queueing the IO is more than
likely a good guess, and a good guess is (IMHO)
better than not guessing at all and hoping things
will be ok.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
   -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/   http://www.surriel.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

 Why do you say it's not been fixed? Can you still reproduce hangs long as
 a write(2) can write? I certainly can't.

I cant reproduce long hangs. Im not seeing as good I/O throughput as before
but right now Im quite happy with the tradeoff. If someone can make it better
then Im happier still

 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



(reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Andrea Arcangeli

On Tue, 12 Sep 2000, Martin Dalecki wrote:

First of all: In the case of the mp3 player and such there is already a
fine
proper way to give it better chances on getting it's job done smooth - 
RT kernel sceduler priorities and proper IO buffering. I did something
similiar
to a GDI printer driver...

Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
than 128mbyte :) and then run in background `cp /dev/zero .` in the same
fs where your mp3 file out of cache is living. Then you'll see why a large
buffer is useless if there's none kind of I/O fair scheduling into the
elevator. Repeat the same test in 2.2.16 then.

The I/O latency Hans was taking about for the mp3 player, is the time it
takes for the buffer to become empty.

device driver level, if at all. In fact you have already bad 
interactions between strategies of low level drivers and the high
level code in Linux - like for example the "get from top of queue" or 
"don't get it from top of the IO queue" mess between
IDE and SCSI middlelayers... (However this got a bit better recently.)

That's historic cruft, it's unrelated to controlling the elevator
algorithm per-task/per-file basis IMHO.

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Andrea Arcangeli wrote:
 
 On Tue, 12 Sep 2000, Martin Dalecki wrote:
 
 First of all: In the case of the mp3 player and such there is already a
 fine
 proper way to give it better chances on getting it's job done smooth -
 RT kernel sceduler priorities and proper IO buffering. I did something
 similiar
 to a GDI printer driver...
 
 Take 2.2.15, set a buffer of 128mbyte (of course assume your mp3 is larger
 than 128mbyte :) and then run in background `cp /dev/zero .` in the same
 fs where your mp3 file out of cache is living. Then you'll see why a large
 buffer is useless if there's none kind of I/O fair scheduling into the
 elevator. Repeat the same test in 2.2.16 then.
 
 The I/O latency Hans was taking about for the mp3 player, is the time it
 takes for the buffer to become empty.

I was talking about *proper* buffering not necessary *big* buffers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
  Now, I see people trying to introduce the concept of elapsed time into
  that fix, which smells strongly of hack. How will this hack be cobbled
 
 Actually my brain says that elapsed time based scheduling is the right thing
 to do.

No, Andrea is right here.  The argument that everyone is using ("Our target -
latency - is measured in time") is utterly bogus.  Yes, it's measured in
time, but remember that there are two things measured in time here:
  A. The time for the whole queue of requests to run (this is what Rik is
 proposing using to throttle)
  B. The time an average request takes to process.

If we limit on the depth of queue we're (to some level of approximation)
making our decision based on A/B.  It's still a magic constant, but at
least it's scaled to take into account the speed of the drive.  And
underneath, it's still based on time.

 It certainly works for networks

Well, actually just about any communications protocol worth its salt
uses some sort of windowing throttle based on the amount of data
outstanding, not the length of time it's been in the queue.  Which
is why TCP works well over both GigE and 28.8. [*]  Now substitute
"big fiberchannel RAID" for GigE and "360K floppy" for 28.8 and
you've got the same problem.

*  -- Yes, for optimal TCP over big WAN pipes you may want to use a
  larger buffer size, but that's a matter of the bandwidth
  delay product, which isn't relavent for talking about storage

If we move to a "length of queue in time" as Rik suggests then we're
going to have to MAKE the user set it manually for each device.
There's too many orders of magnatude difference between even just SCSI
disks (10 yr old drive?  16-way RAID?  Solid state?) to make
supplying any sort of default with the kernel impractical.  The end
result might be a bit better behaved, but only just slightly.
If people absolutely need this behavior for some reason, the current
algorithm should stay as the default.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Alan Cox

 time, but remember that there are two things measured in time here:
   A. The time for the whole queue of requests to run (this is what Rik is
  proposing using to throttle)
   B. The time an average request takes to process.

Your perceived latency is based entirely on A.

 If we limit on the depth of queue we're (to some level of approximation)
 making our decision based on A/B.  It's still a magic constant, but at

I dont suggest you do queue limiting on that basis. I suggest you do order
limiting based on time slots

 Well, actually just about any communications protocol worth its salt
 uses some sort of windowing throttle based on the amount of data

Im talking about flow control/traffic shaping

 If we move to a "length of queue in time" as Rik suggests then we're
 going to have to MAKE the user set it manually for each device.

No

 There's too many orders of magnatude difference between even just SCSI
 disks (10 yr old drive?  16-way RAID?  Solid state?) to make
 supplying any sort of default with the kernel impractical.  The end

The same argument is equally valid for the current scheme, and I think you'll
find equally bogus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Martin Dalecki

Hans Reiser wrote:
 
 I really think Rik has it right here.  In particular, an MP3 player needs to be able 
to say, I have
 X milliseconds of buffer so make my worst case latency X milliseconds.  The number 
of requests is
 the wrong metric, because the time required per request depends on disk geometry, 
disk caching, etc.
 

No the problem is that an application should either: 

1. Take full controll of the underlying system.
2. Don't care about selftuning the OS.

Becouse that's what operating systems are for in first place: Letting
the
applications run without care of the underlying hardware.

Linux is just mistaken by desing that there should be a generic elevator
for any block device sitting on a single queue for any kind of attached
device. Only device drivers know best how to handle queueing and stuff
like
this. The upper layers should only car about semanticall correctness of
the
request orders not about optimization of them.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-12 Thread Mitchell Blank Jr

Alan Cox wrote:
  time, but remember that there are two things measured in time here:
A. The time for the whole queue of requests to run (this is what Rik is
   proposing using to throttle)
B. The time an average request takes to process.
 
 Your perceived latency is based entirely on A.

Yes, but "how hard is it reasonable for the kernel to try" is based on
both items.  A good first order approximation is number of requests.

  If we limit on the depth of queue we're (to some level of approximation)
  making our decision based on A/B.  It's still a magic constant, but at
 
 I dont suggest you do queue limiting on that basis. I suggest you do order
 limiting based on time slots

It's still a queue - the queue of things we're going to take on this
elevator swipe, right?  And the problem is one of keeping a sane
watermark on this queue - not too many requests to destroy latency
but enough to let the elevator do some good.

  Well, actually just about any communications protocol worth its salt
  uses some sort of windowing throttle based on the amount of data
 
 Im talking about flow control/traffic shaping

...where the user sets a number exlpicitly for what performance they
want.  Again, if we're going to make the user set this latency
variable for each of their devices, then doing it based on time will
work great.

  There's too many orders of magnatude difference between even just SCSI
  disks (10 yr old drive?  16-way RAID?  Solid state?) to make
  supplying any sort of default with the kernel impractical.  The end
 
 The same argument is equally valid for the current scheme, and I think you'll
 find equally bogus

There will always need to be tunables - and it's fine to say "if you've
got oddball hardware and/or workload and/or requirements then you should
twiddle this knob".  But it seems to me that the current scheme works
well for a pretty big range of devices.  If you do the setting based
on time, I think it'll be a lot more sensitive since there's nothing
that will scale based on the speed of the device.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-11 Thread Michael T. Babcock

Considering there are a lot of people still using 2.0.x because they find it
more stable than the 2.2.x series, doesn't it make sense to give this
scalability to people who are already running SMP boxes on 2.2.x and who may
decide to use ReiserFS?

- Original Message -
From: "Andrea Arcangeli" <[EMAIL PROTECTED]>


> On Mon, 11 Sep 2000, Andi Kleen wrote:
>
> >BTW, there is a another optimization that could help reiserfs a lot
> >on SMP settings: do a unlock_kernel()/lock_kernel() around the user
> >copies. It is quite legal to do that (you have to handle sleeping
> >anyways in case of a page fault), and it allows CPUs to run in parallel
> >for long running copies.
>
> I'd prefer not to spend time to make 2.2.x to scale better in SMP, 2.4.x
> just fixed that problem by dropping the big lock in first place in the
> read/write paths :). The copy-user reschedule points were bugfixes
> instead.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-11 Thread Andrea Arcangeli

On Mon, 11 Sep 2000, Andi Kleen wrote:

>BTW, there is a another optimization that could help reiserfs a lot 
>on SMP settings: do a unlock_kernel()/lock_kernel() around the user 
>copies. It is quite legal to do that (you have to handle sleeping
>anyways in case of a page fault), and it allows CPUs to run in parallel
>for long running copies.

I'd prefer not to spend time to make 2.2.x to scale better in SMP, 2.4.x
just fixed that problem by dropping the big lock in first place in the
read/write paths :). The copy-user reschedule points were bugfixes
instead.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-11 Thread Andi Kleen

On Mon, Sep 11, 2000 at 08:15:15AM -0400, Chris Mason wrote:
> LFS changes for filldir, reiserfs_readpage, and adds limit checking in 
> file_write to make sure we don't go above 2GB (Andi Kleen).  Also fixes 
> include/linux/fs.h, which does not patch cleanly for 3.5.25 because of usb.
> 
> Note, you might see debugging messages about items moving during 
> copy_from_user.  These are safe, but I'm leaving them in for now as I'd 
> like to find out why copy_from_user is suddenly scheduling much more than 
> it used to.

That's easy to explain. Andrea's latest aa contains some low latency
patches, which add a if (current->need_resched) schedule() to copy*user
to avoid bad schedule latencies for big copies. The result is that you
see a lot more schedules, everytime the copy*user happens to hit the
end of a time slice.

BTW, there is a another optimization that could help reiserfs a lot 
on SMP settings: do a unlock_kernel()/lock_kernel() around the user 
copies. It is quite legal to do that (you have to handle sleeping
anyways in case of a page fault), and it allows CPUs to run in parallel
for long running copies.


-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-11 Thread Andrea Arcangeli

On Mon, 11 Sep 2000, Chris Mason wrote:

>reiserfs-3.5.25, this patch.  I tested against pre3-aa2.

BTW, pre3-aa2 means 2.2.18pre2aa2.bz2 applyed on top of 2.2.18pre3.

>Note, you might see debugging messages about items moving during 
>copy_from_user.  These are safe, but I'm leaving them in for now as I'd 
>like to find out why copy_from_user is suddenly scheduling much more than 
>it used to.

In 2.2.18pre2aa2.bz2 there's a latency bugfix, now a:

read(fd, , 0x7fff)
write(fd, , 0x7fff)
sendfile(src, dst, NULL, 0x7fff)

doesn't hang the machine anymore for several seconds. (well, really they
are all three still a bit buggy because they don't interrupt themself when
a signal arrives so we can't use senfile for `cp` of files smaller than 2g
yet... but at least now you can do something in parallel even if you are
so lucky to have some giga of fs cache)

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: (reiserfs) Re: More on 2.2.18pre2aa2

2000-09-11 Thread Andrea Arcangeli

On Mon, 11 Sep 2000, Andi Kleen wrote:

BTW, there is a another optimization that could help reiserfs a lot 
on SMP settings: do a unlock_kernel()/lock_kernel() around the user 
copies. It is quite legal to do that (you have to handle sleeping
anyways in case of a page fault), and it allows CPUs to run in parallel
for long running copies.

I'd prefer not to spend time to make 2.2.x to scale better in SMP, 2.4.x
just fixed that problem by dropping the big lock in first place in the
read/write paths :). The copy-user reschedule points were bugfixes
instead.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



  1   2   >