Re: cfq performance gap

2006-12-13 Thread Miquel van Smoorenburg
In article <[EMAIL PROTECTED]>,
Chen, Kenneth W <[EMAIL PROTECTED]> wrote:
>Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM
>> Chen, Kenneth W <[EMAIL PROTECTED]> wrote:
>> >This rawio test plows through sequential I/O and modulo each small record
>> >over number of threads.  So each thread appears to be non-contiguous within
>> >its own process context, overall request hitting the device are sequential.
>> >I can't see how any application does that kind of I/O pattern.
>> 
>> A NNTP server that has many incoming connections, handled by
>> multiple threads, that stores the data in cylic buffers ?
>
>Then whichever the thread that dumps the buffer content to the storage
>will do one large contiguous I/O.

In this context, "cyclic buffer" means "large fixed-size file" or
"disk partition", and when the end of that file/partition is reached,
writing resumes at the start (wraps around, starts the next cycle).

Each thread writes an article to disk, which can differ in size
from 1K to 1M. The writes all together are sequential, but the writes
from one thread are definitely not.

This is a real-world example - I have written software that does
exactly this, multithreaded versions of INN exist that with CNFS
storage does exactly this, and Diablo does something comparable
(only it uses processes instead of threads).

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cfq performance gap

2006-12-13 Thread Chen, Kenneth W
Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM
> Chen, Kenneth W <[EMAIL PROTECTED]> wrote:
> >This rawio test plows through sequential I/O and modulo each small record
> >over number of threads.  So each thread appears to be non-contiguous within
> >its own process context, overall request hitting the device are sequential.
> >I can't see how any application does that kind of I/O pattern.
> 
> A NNTP server that has many incoming connections, handled by
> multiple threads, that stores the data in cylic buffers ?

Then whichever the thread that dumps the buffer content to the storage
will do one large contiguous I/O.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-13 Thread Miquel van Smoorenburg
In article <[EMAIL PROTECTED]>,
Chen, Kenneth W <[EMAIL PROTECTED]> wrote:
>This rawio test plows through sequential I/O and modulo each small record
>over number of threads.  So each thread appears to be non-contiguous within
>its own process context, overall request hitting the device are sequential.
>I can't see how any application does that kind of I/O pattern.

A NNTP server that has many incoming connections, handled by
multiple threads, that stores the data in cylic buffers ?

Mike.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-13 Thread Miquel van Smoorenburg
In article [EMAIL PROTECTED],
Chen, Kenneth W [EMAIL PROTECTED] wrote:
This rawio test plows through sequential I/O and modulo each small record
over number of threads.  So each thread appears to be non-contiguous within
its own process context, overall request hitting the device are sequential.
I can't see how any application does that kind of I/O pattern.

A NNTP server that has many incoming connections, handled by
multiple threads, that stores the data in cylic buffers ?

Mike.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cfq performance gap

2006-12-13 Thread Chen, Kenneth W
Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM
 Chen, Kenneth W [EMAIL PROTECTED] wrote:
 This rawio test plows through sequential I/O and modulo each small record
 over number of threads.  So each thread appears to be non-contiguous within
 its own process context, overall request hitting the device are sequential.
 I can't see how any application does that kind of I/O pattern.
 
 A NNTP server that has many incoming connections, handled by
 multiple threads, that stores the data in cylic buffers ?

Then whichever the thread that dumps the buffer content to the storage
will do one large contiguous I/O.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-13 Thread Miquel van Smoorenburg
In article [EMAIL PROTECTED],
Chen, Kenneth W [EMAIL PROTECTED] wrote:
Miquel van Smoorenburg wrote on Wednesday, December 13, 2006 1:57 AM
 Chen, Kenneth W [EMAIL PROTECTED] wrote:
 This rawio test plows through sequential I/O and modulo each small record
 over number of threads.  So each thread appears to be non-contiguous within
 its own process context, overall request hitting the device are sequential.
 I can't see how any application does that kind of I/O pattern.
 
 A NNTP server that has many incoming connections, handled by
 multiple threads, that stores the data in cylic buffers ?

Then whichever the thread that dumps the buffer content to the storage
will do one large contiguous I/O.

In this context, cyclic buffer means large fixed-size file or
disk partition, and when the end of that file/partition is reached,
writing resumes at the start (wraps around, starts the next cycle).

Each thread writes an article to disk, which can differ in size
from 1K to 1M. The writes all together are sequential, but the writes
from one thread are definitely not.

This is a real-world example - I have written software that does
exactly this, multithreaded versions of INN exist that with CNFS
storage does exactly this, and Diablo does something comparable
(only it uses processes instead of threads).

Mike.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-12 Thread Jens Axboe
On Tue, Dec 12 2006, AVANTIKA R. MATHUR wrote:
> >That said, I might add some logic to detect when we can cheaply switch
> >queues instead of waiting for a new request from the same queue.
> >Averaging slice times over a period of time instead of 1:1 with that
> >logic, should help cases like this while still being fair.
> >  
> Thank you for looking at this issue.
> I've found an IBM/SUSE bugzilla bug for the same performance gap on 
> rawio. There was a fix for this bug included in SLES10-RC1, do you know 
> why it was not added in mainline?

Which bug do you mean? It was likely me doing the fixing on that bug,
and I'm certain that the patch is in mainline. If you included the bug
number, I could have expanded on that.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cfq performance gap

2006-12-12 Thread Chen, Kenneth W
AVANTIKA R. MATHUR wrote on Tuesday, December 12, 2006 5:33 PM
> >> rawio is actually performing sequential reads, but I don't believe it is
> >> purely sequential with the multiple processes.
> >> I am currently running the test with longer runtimes and will post
> >> results once it is complete.
> >> I've also attached the rawio source.
> >> 
> >
> > It's certainly the slice and idling hurting here. But at the same time,
> > I don't really think your test case is very interesting. The test area
> > is very small and you have 16 threads trying to read the same thing,
> > optimizing for that would be silly as I don't think it has much real
> > world relevance.
> 
> Could a database have similar workload to this test?


No.

Not what I have seen with db workloads exhibits such pattern.  There are
basically two types of db workloads: one does transaction processing, and
I/O pattern are truly random with large stride, both in the context of
process and overall I/O seen at device level.  A second one is decision
making type of db queries.  They does large sequential I/O within one
process context.

This rawio test plows through sequential I/O and modulo each small record
over number of threads.  So each thread appears to be non-contiguous within
its own process context, overall request hitting the device are sequential.
I can't see how any application does that kind of I/O pattern.

- Ken
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-12 Thread AVANTIKA R. MATHUR

Jens Axboe wrote:

On Fri, Dec 08 2006, Avantika Mathur wrote:
  

On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:


On Thu, Dec 07 2006, Avantika Mathur wrote:
  

Hi Jens,


(you probably noticed now, but the [EMAIL PROTECTED] email is no longer
valid)
  

I saw that, thanks!


I've noticed a performance gap between the cfq scheduler and other io
schedulers when running the rawio benchmark.

The benchmark workload is 16 processes running 4k random reads.

Is this performance gap a known issue?


CFQ could be a little slower at this benchmark, but your results are
much worse than I would expect. What is the queueing depth of sda? How
are you invoking rawio?
  

I am running rawio with the following options:
rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096

The queue depth on sda is 4.


Your runtime is very low, how does it look if you allow the test to run
for much longer? 30MiB/sec random read bandwidth seems very high, I'm
wondering what exactly is being tested here.
  

rawio is actually performing sequential reads, but I don't believe it is
purely sequential with the multiple processes.
I am currently running the test with longer runtimes and will post
results once it is complete.
I've also attached the rawio source.



It's certainly the slice and idling hurting here. But at the same time,
I don't really think your test case is very interesting. The test area
is very small and you have 16 threads trying to read the same thing,
optimizing for that would be silly as I don't think it has much real
world relevance.

  

Could a database have similar workload to this test?

That said, I might add some logic to detect when we can cheaply switch
queues instead of waiting for a new request from the same queue.
Averaging slice times over a period of time instead of 1:1 with that
logic, should help cases like this while still being fair.
  

Thank you for looking at this issue.
I've found an IBM/SUSE bugzilla bug for the same performance gap on 
rawio. There was a fix for this bug included in SLES10-RC1, do you know 
why it was not added in mainline?


Thanks again,
Avantika Mathur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-12 Thread AVANTIKA R. MATHUR

Jens Axboe wrote:

On Fri, Dec 08 2006, Avantika Mathur wrote:
  

On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:


On Thu, Dec 07 2006, Avantika Mathur wrote:
  

Hi Jens,


(you probably noticed now, but the [EMAIL PROTECTED] email is no longer
valid)
  

I saw that, thanks!


I've noticed a performance gap between the cfq scheduler and other io
schedulers when running the rawio benchmark.

The benchmark workload is 16 processes running 4k random reads.

Is this performance gap a known issue?


CFQ could be a little slower at this benchmark, but your results are
much worse than I would expect. What is the queueing depth of sda? How
are you invoking rawio?
  

I am running rawio with the following options:
rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096

The queue depth on sda is 4.


Your runtime is very low, how does it look if you allow the test to run
for much longer? 30MiB/sec random read bandwidth seems very high, I'm
wondering what exactly is being tested here.
  

rawio is actually performing sequential reads, but I don't believe it is
purely sequential with the multiple processes.
I am currently running the test with longer runtimes and will post
results once it is complete.
I've also attached the rawio source.



It's certainly the slice and idling hurting here. But at the same time,
I don't really think your test case is very interesting. The test area
is very small and you have 16 threads trying to read the same thing,
optimizing for that would be silly as I don't think it has much real
world relevance.

  

Could a database have similar workload to this test?

That said, I might add some logic to detect when we can cheaply switch
queues instead of waiting for a new request from the same queue.
Averaging slice times over a period of time instead of 1:1 with that
logic, should help cases like this while still being fair.
  

Thank you for looking at this issue.
I've found an IBM/SUSE bugzilla bug for the same performance gap on 
rawio. There was a fix for this bug included in SLES10-RC1, do you know 
why it was not added in mainline?


Thanks again,
Avantika Mathur
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: cfq performance gap

2006-12-12 Thread Chen, Kenneth W
AVANTIKA R. MATHUR wrote on Tuesday, December 12, 2006 5:33 PM
  rawio is actually performing sequential reads, but I don't believe it is
  purely sequential with the multiple processes.
  I am currently running the test with longer runtimes and will post
  results once it is complete.
  I've also attached the rawio source.
  
 
  It's certainly the slice and idling hurting here. But at the same time,
  I don't really think your test case is very interesting. The test area
  is very small and you have 16 threads trying to read the same thing,
  optimizing for that would be silly as I don't think it has much real
  world relevance.
 
 Could a database have similar workload to this test?


No.

Not what I have seen with db workloads exhibits such pattern.  There are
basically two types of db workloads: one does transaction processing, and
I/O pattern are truly random with large stride, both in the context of
process and overall I/O seen at device level.  A second one is decision
making type of db queries.  They does large sequential I/O within one
process context.

This rawio test plows through sequential I/O and modulo each small record
over number of threads.  So each thread appears to be non-contiguous within
its own process context, overall request hitting the device are sequential.
I can't see how any application does that kind of I/O pattern.

- Ken
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-12 Thread Jens Axboe
On Tue, Dec 12 2006, AVANTIKA R. MATHUR wrote:
 That said, I might add some logic to detect when we can cheaply switch
 queues instead of waiting for a new request from the same queue.
 Averaging slice times over a period of time instead of 1:1 with that
 logic, should help cases like this while still being fair.
   
 Thank you for looking at this issue.
 I've found an IBM/SUSE bugzilla bug for the same performance gap on 
 rawio. There was a fix for this bug included in SLES10-RC1, do you know 
 why it was not added in mainline?

Which bug do you mean? It was likely me doing the fixing on that bug,
and I'm certain that the patch is in mainline. If you included the bug
number, I could have expanded on that.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-11 Thread Jens Axboe
On Fri, Dec 08 2006, Avantika Mathur wrote:
> On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:
> > On Thu, Dec 07 2006, Avantika Mathur wrote:
> > > Hi Jens,
> > 
> > (you probably noticed now, but the [EMAIL PROTECTED] email is no longer
> > valid)
> 
> I saw that, thanks!
> > > I've noticed a performance gap between the cfq scheduler and other io
> > > schedulers when running the rawio benchmark.
> > > Results from rawio on 2.6.19, cfq and noop schedulers:
> > >
> > > CFQ:
> > >
> > > procs   devicenum read   KB/sec I/O Ops/sec
> > > -  ---  --  ---  --
> > >   16 /dev/sda   16412 83382084
> > > -  ---  --  ---  --
> > >   1616412 83382084
> > >
> > > Total run time 0.492072 seconds
> > >
> > >
> > > NOOP:
> > >
> > > procs   devicenum read   KB/sec I/O Ops/sec
> > > -  ---  --  ---  --
> > >   16 /dev/sda   16399292247306
> > > -  ---  --  ---  --
> > >   1616399292247306
> > >
> > > Total run time 0.140284 seconds
> > >
> > > The benchmark workload is 16 processes running 4k random reads.
> > >
> > > Is this performance gap a known issue?
> > 
> > CFQ could be a little slower at this benchmark, but your results are
> > much worse than I would expect. What is the queueing depth of sda? How
> > are you invoking rawio?
> 
> I am running rawio with the following options:
> rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096
>  
> The queue depth on sda is 4.
> 
> > 
> > Your runtime is very low, how does it look if you allow the test to run
> > for much longer? 30MiB/sec random read bandwidth seems very high, I'm
> > wondering what exactly is being tested here.
> > 
> 
> rawio is actually performing sequential reads, but I don't believe it is
> purely sequential with the multiple processes.
> I am currently running the test with longer runtimes and will post
> results once it is complete. 
> I've also attached the rawio source.

It's certainly the slice and idling hurting here. But at the same time,
I don't really think your test case is very interesting. The test area
is very small and you have 16 threads trying to read the same thing,
optimizing for that would be silly as I don't think it has much real
world relevance.

That said, I might add some logic to detect when we can cheaply switch
queues instead of waiting for a new request from the same queue.
Averaging slice times over a period of time instead of 1:1 with that
logic, should help cases like this while still being fair.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-11 Thread Jens Axboe
On Fri, Dec 08 2006, Avantika Mathur wrote:
 On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:
  On Thu, Dec 07 2006, Avantika Mathur wrote:
   Hi Jens,
  
  (you probably noticed now, but the [EMAIL PROTECTED] email is no longer
  valid)
 
 I saw that, thanks!
   I've noticed a performance gap between the cfq scheduler and other io
   schedulers when running the rawio benchmark.
   Results from rawio on 2.6.19, cfq and noop schedulers:
  
   CFQ:
  
   procs   devicenum read   KB/sec I/O Ops/sec
   -  ---  --  ---  --
 16 /dev/sda   16412 83382084
   -  ---  --  ---  --
 1616412 83382084
  
   Total run time 0.492072 seconds
  
  
   NOOP:
  
   procs   devicenum read   KB/sec I/O Ops/sec
   -  ---  --  ---  --
 16 /dev/sda   16399292247306
   -  ---  --  ---  --
 1616399292247306
  
   Total run time 0.140284 seconds
  
   The benchmark workload is 16 processes running 4k random reads.
  
   Is this performance gap a known issue?
  
  CFQ could be a little slower at this benchmark, but your results are
  much worse than I would expect. What is the queueing depth of sda? How
  are you invoking rawio?
 
 I am running rawio with the following options:
 rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096
  
 The queue depth on sda is 4.
 
  
  Your runtime is very low, how does it look if you allow the test to run
  for much longer? 30MiB/sec random read bandwidth seems very high, I'm
  wondering what exactly is being tested here.
  
 
 rawio is actually performing sequential reads, but I don't believe it is
 purely sequential with the multiple processes.
 I am currently running the test with longer runtimes and will post
 results once it is complete. 
 I've also attached the rawio source.

It's certainly the slice and idling hurting here. But at the same time,
I don't really think your test case is very interesting. The test area
is very small and you have 16 threads trying to read the same thing,
optimizing for that would be silly as I don't think it has much real
world relevance.

That said, I might add some logic to detect when we can cheaply switch
queues instead of waiting for a new request from the same queue.
Averaging slice times over a period of time instead of 1:1 with that
logic, should help cases like this while still being fair.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-08 Thread Avantika Mathur
On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:
> On Thu, Dec 07 2006, Avantika Mathur wrote:
> > Hi Jens,
> 
> (you probably noticed now, but the [EMAIL PROTECTED] email is no longer
> valid)

I saw that, thanks!
> > I've noticed a performance gap between the cfq scheduler and other io
> > schedulers when running the rawio benchmark.
> > Results from rawio on 2.6.19, cfq and noop schedulers:
> >
> > CFQ:
> >
> > procs   devicenum read   KB/sec I/O Ops/sec
> > -  ---  --  ---  --
> >   16 /dev/sda   16412 83382084
> > -  ---  --  ---  --
> >   1616412 83382084
> >
> > Total run time 0.492072 seconds
> >
> >
> > NOOP:
> >
> > procs   devicenum read   KB/sec I/O Ops/sec
> > -  ---  --  ---  --
> >   16 /dev/sda   16399292247306
> > -  ---  --  ---  --
> >   1616399292247306
> >
> > Total run time 0.140284 seconds
> >
> > The benchmark workload is 16 processes running 4k random reads.
> >
> > Is this performance gap a known issue?
> 
> CFQ could be a little slower at this benchmark, but your results are
> much worse than I would expect. What is the queueing depth of sda? How
> are you invoking rawio?

I am running rawio with the following options:
rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096
 
The queue depth on sda is 4.

> 
> Your runtime is very low, how does it look if you allow the test to run
> for much longer? 30MiB/sec random read bandwidth seems very high, I'm
> wondering what exactly is being tested here.
> 

rawio is actually performing sequential reads, but I don't believe it is
purely sequential with the multiple processes.
I am currently running the test with longer runtimes and will post
results once it is complete. 
I've also attached the rawio source.

Thanks,
Avantika



rawio-2.4.2.tar.gz
Description: application/compressed-tar


Re: cfq performance gap

2006-12-08 Thread Jens Axboe
On Thu, Dec 07 2006, Avantika Mathur wrote:
> Hi Jens, 

(you probably noticed now, but the [EMAIL PROTECTED] email is no longer
valid)

> I've noticed a performance gap between the cfq scheduler and other io
> schedulers when running the rawio benchmark. 
> Results from rawio on 2.6.19, cfq and noop schedulers: 
> 
> CFQ: 
> 
> procs   devicenum read   KB/sec I/O Ops/sec 
> -  ---  --  ---  -- 
>   16 /dev/sda   16412 83382084 
> -  ---  --  ---  -- 
>   1616412 83382084 
> 
> Total run time 0.492072 seconds 
> 
> 
> NOOP: 
> 
> procs   devicenum read   KB/sec I/O Ops/sec 
> -  ---  --  ---  -- 
>   16 /dev/sda   16399292247306 
> -  ---  --  ---  -- 
>   1616399292247306 
> 
> Total run time 0.140284 seconds 
> 
> The benchmark workload is 16 processes running 4k random reads. 
> 
> Is this performance gap a known issue? 

CFQ could be a little slower at this benchmark, but your results are
much worse than I would expect. What is the queueing depth of sda? How
are you invoking rawio?

Your runtime is very low, how does it look if you allow the test to run
for much longer? 30MiB/sec random read bandwidth seems very high, I'm
wondering what exactly is being tested here.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-08 Thread Jens Axboe
On Thu, Dec 07 2006, Avantika Mathur wrote:
 Hi Jens, 

(you probably noticed now, but the [EMAIL PROTECTED] email is no longer
valid)

 I've noticed a performance gap between the cfq scheduler and other io
 schedulers when running the rawio benchmark. 
 Results from rawio on 2.6.19, cfq and noop schedulers: 
 
 CFQ: 
 
 procs   devicenum read   KB/sec I/O Ops/sec 
 -  ---  --  ---  -- 
   16 /dev/sda   16412 83382084 
 -  ---  --  ---  -- 
   1616412 83382084 
 
 Total run time 0.492072 seconds 
 
 
 NOOP: 
 
 procs   devicenum read   KB/sec I/O Ops/sec 
 -  ---  --  ---  -- 
   16 /dev/sda   16399292247306 
 -  ---  --  ---  -- 
   1616399292247306 
 
 Total run time 0.140284 seconds 
 
 The benchmark workload is 16 processes running 4k random reads. 
 
 Is this performance gap a known issue? 

CFQ could be a little slower at this benchmark, but your results are
much worse than I would expect. What is the queueing depth of sda? How
are you invoking rawio?

Your runtime is very low, how does it look if you allow the test to run
for much longer? 30MiB/sec random read bandwidth seems very high, I'm
wondering what exactly is being tested here.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: cfq performance gap

2006-12-08 Thread Avantika Mathur
On Fri, 2006-12-08 at 13:05 +0100, Jens Axboe wrote:
 On Thu, Dec 07 2006, Avantika Mathur wrote:
  Hi Jens,
 
 (you probably noticed now, but the [EMAIL PROTECTED] email is no longer
 valid)

I saw that, thanks!
  I've noticed a performance gap between the cfq scheduler and other io
  schedulers when running the rawio benchmark.
  Results from rawio on 2.6.19, cfq and noop schedulers:
 
  CFQ:
 
  procs   devicenum read   KB/sec I/O Ops/sec
  -  ---  --  ---  --
16 /dev/sda   16412 83382084
  -  ---  --  ---  --
1616412 83382084
 
  Total run time 0.492072 seconds
 
 
  NOOP:
 
  procs   devicenum read   KB/sec I/O Ops/sec
  -  ---  --  ---  --
16 /dev/sda   16399292247306
  -  ---  --  ---  --
1616399292247306
 
  Total run time 0.140284 seconds
 
  The benchmark workload is 16 processes running 4k random reads.
 
  Is this performance gap a known issue?
 
 CFQ could be a little slower at this benchmark, but your results are
 much worse than I would expect. What is the queueing depth of sda? How
 are you invoking rawio?

I am running rawio with the following options:
rawread -p 16 -m 1 -d 1 -x -z -t 0 -s 4096
 
The queue depth on sda is 4.

 
 Your runtime is very low, how does it look if you allow the test to run
 for much longer? 30MiB/sec random read bandwidth seems very high, I'm
 wondering what exactly is being tested here.
 

rawio is actually performing sequential reads, but I don't believe it is
purely sequential with the multiple processes.
I am currently running the test with longer runtimes and will post
results once it is complete. 
I've also attached the rawio source.

Thanks,
Avantika



rawio-2.4.2.tar.gz
Description: application/compressed-tar