Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Neil Brown
On Wednesday August 1, [EMAIL PROTECTED] wrote:
> 
> In any case, why does something so complicated need to be a macro, why
> not a function instead?

There needs to be a macro so you can put a statement after it to be
executed "for each ..."
But you are right that it doesn't all need to be in the one macro.

The idea of something like

#define bio_for_each_segment_offset(bv, bio, _i, offset, _size) \
for (bio_iterator_init(bio, &_i, , offset, _size);   \
 i.remaining > 0 ;  \
 bio_next(bio, &_i, ))

with bio_iterator_init and bio_next being (inline?) functions is a
very good one.  I'll see what works.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Tejun Heo
On Wed, Aug 01, 2007 at 11:52:35AM -0400, John Stoffel wrote:
> 
> Tejun> Avi Kivity wrote:
> >> NeilBrown wrote:
> >>> To achieve this, the "for_each" macros are now somewhat more complex.
> >>> For example, rq_for_each_segment is:
> >>> 
> >>> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
> >>> for (_i.i = 0, _i.offset = (bio)->bi_offset + offs,\
> >>> _i.size = min_t(int, _size, (bio)->bi_size - offs);   \
> >>> _i.i < (bio)->bi_vcnt && _i.size > 0;\
> >>> _i.i++)\
> >>> if (bv = *bio_iovec_idx((bio), _i.i),\
> >>> bv.bv_offset += _i.offset,\
> >>> bv.bv_len <= _i.offset\
> >>> ? (_i.offset -= bv.bv_len, 0)\
> >>> : (bv.bv_len -= _i.offset,\
> >>> _i.offset = 0,\
> >>> bv.bv_len < _i.size\
> >>> ? (_i.size -= bv.bv_len, 1)\
> >>> : (bv.bv_len = _i.size,\
> >>> _i.size = 0,\
> >>> bv.bv_len > 0)))
> >>> 
> >>> #define bio_for_each_segment(bv, bio, __i)\
> >>> bio_for_each_segment_offset(bv, bio, __i, 0, (bio)->bi_size)
> >>> 
> >>> It does some with some explanatory text in a comment, but it is still
> >>> a bit daunting.  Any suggestions on making this more approachable
> >>> would be very welcome.
> >>> 
> >>> 
> >> 
> >> Well, I hesitate to state the obvious, but how about:
> >> 
> >> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
> >> for (bio_iterator_init(&_i, ...); bio_iterator_cont(&_i, ...);
> >> bio_iterator_advance(&_i, ...)) \
> >> if (bio_iterator_want_segment(&_i, ...))
> >> 
> >> While this doesn't remove the complexity, at least it's readable.
> 
> Tejun> Violently seconded.
> 
> How about it be made into a real function instead?  I was reading
> through the patch, but got timed out yesterday, so take this with a
> grain of salt.  
> 
> I thought I saw a couple of macros defined to use this macro yet
> again.  Which I figured might be a problem is the passed in variables
> get munged.
> 
> In any case, why does something so complicated need to be a macro, why
> not a function instead?

I agree and actually wrote about the same opinion in one of the
replies.  It might even be benefitial performance-wise due to smaller
cache foot print.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread John Stoffel

Tejun> Avi Kivity wrote:
>> NeilBrown wrote:
>>> To achieve this, the "for_each" macros are now somewhat more complex.
>>> For example, rq_for_each_segment is:
>>> 
>>> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
>>> for (_i.i = 0, _i.offset = (bio)->bi_offset + offs,\
>>> _i.size = min_t(int, _size, (bio)->bi_size - offs);   \
>>> _i.i < (bio)->bi_vcnt && _i.size > 0;\
>>> _i.i++)\
>>> if (bv = *bio_iovec_idx((bio), _i.i),\
>>> bv.bv_offset += _i.offset,\
>>> bv.bv_len <= _i.offset\
>>> ? (_i.offset -= bv.bv_len, 0)\
>>> : (bv.bv_len -= _i.offset,\
>>> _i.offset = 0,\
>>> bv.bv_len < _i.size\
>>> ? (_i.size -= bv.bv_len, 1)\
>>> : (bv.bv_len = _i.size,\
>>> _i.size = 0,\
>>> bv.bv_len > 0)))
>>> 
>>> #define bio_for_each_segment(bv, bio, __i)\
>>> bio_for_each_segment_offset(bv, bio, __i, 0, (bio)->bi_size)
>>> 
>>> It does some with some explanatory text in a comment, but it is still
>>> a bit daunting.  Any suggestions on making this more approachable
>>> would be very welcome.
>>> 
>>> 
>> 
>> Well, I hesitate to state the obvious, but how about:
>> 
>> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
>> for (bio_iterator_init(&_i, ...); bio_iterator_cont(&_i, ...);
>> bio_iterator_advance(&_i, ...)) \
>> if (bio_iterator_want_segment(&_i, ...))
>> 
>> While this doesn't remove the complexity, at least it's readable.

Tejun> Violently seconded.

How about it be made into a real function instead?  I was reading
through the patch, but got timed out yesterday, so take this with a
grain of salt.  

I thought I saw a couple of macros defined to use this macro yet
again.  Which I figured might be a problem is the passed in variables
get munged.

In any case, why does something so complicated need to be a macro, why
not a function instead?

John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Tejun Heo
Avi Kivity wrote:
> NeilBrown wrote:
>> To achieve this, the "for_each" macros are now somewhat more complex.
>> For example, rq_for_each_segment is:
>>
>> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
>> for (_i.i = 0, _i.offset = (bio)->bi_offset + offs,\
>>  _i.size = min_t(int, _size, (bio)->bi_size - offs);   \
>>  _i.i < (bio)->bi_vcnt && _i.size > 0;\
>>  _i.i++)\
>> if (bv = *bio_iovec_idx((bio), _i.i),\
>> bv.bv_offset += _i.offset,\
>> bv.bv_len <= _i.offset\
>> ? (_i.offset -= bv.bv_len, 0)\
>> : (bv.bv_len -= _i.offset,\
>>_i.offset = 0,\
>>bv.bv_len < _i.size\
>>? (_i.size -= bv.bv_len, 1)\
>>: (bv.bv_len = _i.size,\
>>   _i.size = 0,\
>>   bv.bv_len > 0)))
>>
>> #define bio_for_each_segment(bv, bio, __i)\
>> bio_for_each_segment_offset(bv, bio, __i, 0, (bio)->bi_size)
>>
>> It does some with some explanatory text in a comment, but it is still
>> a bit daunting.  Any suggestions on making this more approachable
>> would be very welcome.
>>
>>   
> 
> Well, I hesitate to state the obvious, but how about:
> 
> #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
>for (bio_iterator_init(&_i, ...); bio_iterator_cont(&_i, ...);
> bio_iterator_advance(&_i, ...)) \
> if (bio_iterator_want_segment(&_i, ...))
> 
> While this doesn't remove the complexity, at least it's readable.

Violently seconded.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Tejun Heo
Avi Kivity wrote:
 NeilBrown wrote:
 To achieve this, the for_each macros are now somewhat more complex.
 For example, rq_for_each_segment is:

 #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
 for (_i.i = 0, _i.offset = (bio)-bi_offset + offs,\
  _i.size = min_t(int, _size, (bio)-bi_size - offs);   \
  _i.i  (bio)-bi_vcnt  _i.size  0;\
  _i.i++)\
 if (bv = *bio_iovec_idx((bio), _i.i),\
 bv.bv_offset += _i.offset,\
 bv.bv_len = _i.offset\
 ? (_i.offset -= bv.bv_len, 0)\
 : (bv.bv_len -= _i.offset,\
_i.offset = 0,\
bv.bv_len  _i.size\
? (_i.size -= bv.bv_len, 1)\
: (bv.bv_len = _i.size,\
   _i.size = 0,\
   bv.bv_len  0)))

 #define bio_for_each_segment(bv, bio, __i)\
 bio_for_each_segment_offset(bv, bio, __i, 0, (bio)-bi_size)

 It does some with some explanatory text in a comment, but it is still
 a bit daunting.  Any suggestions on making this more approachable
 would be very welcome.

   
 
 Well, I hesitate to state the obvious, but how about:
 
 #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
for (bio_iterator_init(_i, ...); bio_iterator_cont(_i, ...);
 bio_iterator_advance(_i, ...)) \
 if (bio_iterator_want_segment(_i, ...))
 
 While this doesn't remove the complexity, at least it's readable.

Violently seconded.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Tejun Heo
On Wed, Aug 01, 2007 at 11:52:35AM -0400, John Stoffel wrote:
 
 Tejun Avi Kivity wrote:
  NeilBrown wrote:
  To achieve this, the for_each macros are now somewhat more complex.
  For example, rq_for_each_segment is:
  
  #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
  for (_i.i = 0, _i.offset = (bio)-bi_offset + offs,\
  _i.size = min_t(int, _size, (bio)-bi_size - offs);   \
  _i.i  (bio)-bi_vcnt  _i.size  0;\
  _i.i++)\
  if (bv = *bio_iovec_idx((bio), _i.i),\
  bv.bv_offset += _i.offset,\
  bv.bv_len = _i.offset\
  ? (_i.offset -= bv.bv_len, 0)\
  : (bv.bv_len -= _i.offset,\
  _i.offset = 0,\
  bv.bv_len  _i.size\
  ? (_i.size -= bv.bv_len, 1)\
  : (bv.bv_len = _i.size,\
  _i.size = 0,\
  bv.bv_len  0)))
  
  #define bio_for_each_segment(bv, bio, __i)\
  bio_for_each_segment_offset(bv, bio, __i, 0, (bio)-bi_size)
  
  It does some with some explanatory text in a comment, but it is still
  a bit daunting.  Any suggestions on making this more approachable
  would be very welcome.
  
  
  
  Well, I hesitate to state the obvious, but how about:
  
  #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
  for (bio_iterator_init(_i, ...); bio_iterator_cont(_i, ...);
  bio_iterator_advance(_i, ...)) \
  if (bio_iterator_want_segment(_i, ...))
  
  While this doesn't remove the complexity, at least it's readable.
 
 Tejun Violently seconded.
 
 How about it be made into a real function instead?  I was reading
 through the patch, but got timed out yesterday, so take this with a
 grain of salt.  
 
 I thought I saw a couple of macros defined to use this macro yet
 again.  Which I figured might be a problem is the passed in variables
 get munged.
 
 In any case, why does something so complicated need to be a macro, why
 not a function instead?

I agree and actually wrote about the same opinion in one of the
replies.  It might even be benefitial performance-wise due to smaller
cache foot print.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread John Stoffel

Tejun Avi Kivity wrote:
 NeilBrown wrote:
 To achieve this, the for_each macros are now somewhat more complex.
 For example, rq_for_each_segment is:
 
 #define bio_for_each_segment_offset(bv, bio, _i, offs, _size)\
 for (_i.i = 0, _i.offset = (bio)-bi_offset + offs,\
 _i.size = min_t(int, _size, (bio)-bi_size - offs);   \
 _i.i  (bio)-bi_vcnt  _i.size  0;\
 _i.i++)\
 if (bv = *bio_iovec_idx((bio), _i.i),\
 bv.bv_offset += _i.offset,\
 bv.bv_len = _i.offset\
 ? (_i.offset -= bv.bv_len, 0)\
 : (bv.bv_len -= _i.offset,\
 _i.offset = 0,\
 bv.bv_len  _i.size\
 ? (_i.size -= bv.bv_len, 1)\
 : (bv.bv_len = _i.size,\
 _i.size = 0,\
 bv.bv_len  0)))
 
 #define bio_for_each_segment(bv, bio, __i)\
 bio_for_each_segment_offset(bv, bio, __i, 0, (bio)-bi_size)
 
 It does some with some explanatory text in a comment, but it is still
 a bit daunting.  Any suggestions on making this more approachable
 would be very welcome.
 
 
 
 Well, I hesitate to state the obvious, but how about:
 
 #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
 for (bio_iterator_init(_i, ...); bio_iterator_cont(_i, ...);
 bio_iterator_advance(_i, ...)) \
 if (bio_iterator_want_segment(_i, ...))
 
 While this doesn't remove the complexity, at least it's readable.

Tejun Violently seconded.

How about it be made into a real function instead?  I was reading
through the patch, but got timed out yesterday, so take this with a
grain of salt.  

I thought I saw a couple of macros defined to use this macro yet
again.  Which I figured might be a problem is the passed in variables
get munged.

In any case, why does something so complicated need to be a macro, why
not a function instead?

John
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-08-01 Thread Neil Brown
On Wednesday August 1, [EMAIL PROTECTED] wrote:
 
 In any case, why does something so complicated need to be a macro, why
 not a function instead?

There needs to be a macro so you can put a statement after it to be
executed for each ...
But you are right that it doesn't all need to be in the one macro.

The idea of something like

#define bio_for_each_segment_offset(bv, bio, _i, offset, _size) \
for (bio_iterator_init(bio, _i, bv, offset, _size);   \
 i.remaining  0 ;  \
 bio_next(bio, _i, bv))

with bio_iterator_init and bio_next being (inline?) functions is a
very good one.  I'll see what works.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-07-31 Thread Avi Kivity

NeilBrown wrote:

To achieve this, the "for_each" macros are now somewhat more complex.
For example, rq_for_each_segment is:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size)   \
for (_i.i = 0, _i.offset = (bio)->bi_offset + offs,  \
 _i.size = min_t(int, _size, (bio)->bi_size - offs); \
 _i.i < (bio)->bi_vcnt && _i.size > 0; \
 _i.i++)\
if (bv = *bio_iovec_idx((bio), _i.i),   \
bv.bv_offset += _i.offset,  \
bv.bv_len <= _i.offset   \
? (_i.offset -= bv.bv_len, 0)   \
: (bv.bv_len -= _i.offset,  \
   _i.offset = 0,   \
   bv.bv_len < _i.size   \
   ? (_i.size -= bv.bv_len, 1)  \
   : (bv.bv_len = _i.size,  \
  _i.size = 0,  \
  bv.bv_len > 0)))

#define bio_for_each_segment(bv, bio, __i)  \
bio_for_each_segment_offset(bv, bio, __i, 0, (bio)->bi_size)

It does some with some explanatory text in a comment, but it is still
a bit daunting.  Any suggestions on making this more approachable
would be very welcome.

  


Well, I hesitate to state the obvious, but how about:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
   for (bio_iterator_init(&_i, ...); bio_iterator_cont(&_i, ...); 
bio_iterator_advance(&_i, ...)) \

if (bio_iterator_want_segment(&_i, ...))

While this doesn't remove the complexity, at least it's readable.

--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-07-31 Thread Avi Kivity

NeilBrown wrote:

To achieve this, the for_each macros are now somewhat more complex.
For example, rq_for_each_segment is:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size)   \
for (_i.i = 0, _i.offset = (bio)-bi_offset + offs,  \
 _i.size = min_t(int, _size, (bio)-bi_size - offs); \
 _i.i  (bio)-bi_vcnt  _i.size  0; \
 _i.i++)\
if (bv = *bio_iovec_idx((bio), _i.i),   \
bv.bv_offset += _i.offset,  \
bv.bv_len = _i.offset   \
? (_i.offset -= bv.bv_len, 0)   \
: (bv.bv_len -= _i.offset,  \
   _i.offset = 0,   \
   bv.bv_len  _i.size   \
   ? (_i.size -= bv.bv_len, 1)  \
   : (bv.bv_len = _i.size,  \
  _i.size = 0,  \
  bv.bv_len  0)))

#define bio_for_each_segment(bv, bio, __i)  \
bio_for_each_segment_offset(bv, bio, __i, 0, (bio)-bi_size)

It does some with some explanatory text in a comment, but it is still
a bit daunting.  Any suggestions on making this more approachable
would be very welcome.

  


Well, I hesitate to state the obvious, but how about:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \
   for (bio_iterator_init(_i, ...); bio_iterator_cont(_i, ...); 
bio_iterator_advance(_i, ...)) \

if (bio_iterator_want_segment(_i, ...))

While this doesn't remove the complexity, at least it's readable.

--
error compiling committee.c: too many arguments to function

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-07-30 Thread NeilBrown

The following 35(!) patches achieve a refactoring of some parts of the
block layer to provide better support for stacked devices.

The core issue is that of letting bio_add_page know the limitation
that the device imposes so that it doesn't create a bio that is too large.

For a unstacked disk device (e.g. scsi), bio_add_page can access
max_nr_sectors and max_nr_segments and some other details to know how
segments should be counted, and does the appropriate checks (this is a
simplification, get is close enough for this discussion).

For stacked devices (dm, md etc) bio_add_page can also call into the
driver via merge_bvec_fn to find out if a page can be added to a bio.

This mostly works for a simple stack (e.g. md on scsi) but breaks down
with more complicated stacks (dm on md on scsi) as the recusive calls
to merge_bvec_fn that are required are difficult to get right, and
don't provide any guarantees in the face of array reconfiguration anyway.
dm and md both take the approach of "if the never level down defines
merge_bvec_fn, then set max_sectors to PAGE_SIZE/512 and live with small
requests".

So this patchset introduces a new approach.  bio_add_page is allowed
to create bios as big as it likes, and each layer is responsible for
splitting that bio up as required.

For intermediate levels like raid0, a number of new bios might be
created which refer to parts of the original, including parts of the
bi_io_vec.

For the bottom level driver (__make_request), each "struct request"
can refer to just part of a bio, so a bio can be effectively split
among several requests (a request can still reference multiple small
bios, and can concievable list parts of large bios and some small bios
as well, though the merging required to achieve this isn't implemented
yet - that patch set is big enough as it is).

This requires that the bi_io_vec become immutable, and that certain
parts of the bio become immutable.

To achieve this, we introduce fields into the bio so that it can point
to just part of the bi_io_vec (an offset and a size) and introduce
similar fields into 'struct request' to refer to only part of a bio list.

I am keen to receive both review and testing.  I have tested it on
SATA drives with a range of md configurations, but haven't tested dm,
or ide-floppy, or various other bits that needed to be changed.

Probably the changes that are mostly likely to raise eyebrows involve
the code to iterate over the segments in a bio or in a 'struct
request', so I'll give a bit more detail about them here.

Previously these (bio_for_each_segment, rq_for_each_bio) were simple
macros that provided pointers into bi_io_vec.  

As the actual segments that a request might need to handle may no
longer be explicitly in bi_io_vec (e.g. an offset might need to be
added, or a size restriction might need to be imposed) this is no
longer possible.  Instead, these functions (now rq_for_each_segment
and bio_for_each_segment) fill in a 'struct bio_vec' with appropriate
values.  e.g.
  struct bio_vec bvec;
  struct bio_iterator i;
  bio_for_each_segment(bvec, bio, i)
use bvec.bv_page, bvec.bi_offset, bvec.bv_len

This might seem like data is being copied around a bit more, but it
should all be in L1 cache and could conceivable be optimised into
registers by the compiler, so I don't believe this is a big problem
(no, I haven't figured a good way to test it).

To achieve this, the "for_each" macros are now somewhat more complex.
For example, rq_for_each_segment is:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size)   \
for (_i.i = 0, _i.offset = (bio)->bi_offset + offs, \
 _i.size = min_t(int, _size, (bio)->bi_size - offs);\
 _i.i < (bio)->bi_vcnt && _i.size > 0;  \
 _i.i++)\
if (bv = *bio_iovec_idx((bio), _i.i),   \
bv.bv_offset += _i.offset,  \
bv.bv_len <= _i.offset  \
? (_i.offset -= bv.bv_len, 0)   \
: (bv.bv_len -= _i.offset,  \
   _i.offset = 0,   \
   bv.bv_len < _i.size  \
   ? (_i.size -= bv.bv_len, 1)  \
   : (bv.bv_len = _i.size,  \
  _i.size = 0,  \
  bv.bv_len > 0)))

#define bio_for_each_segment(bv, bio, __i)  \
bio_for_each_segment_offset(bv, bio, __i, 0, (bio)->bi_size)

It does some with some explanatory text in a comment, but it is still
a bit daunting.  Any suggestions on making this more approachable
would be very welcome.

Rather than 'cc' this to 

[PATCH 000 of 35] Refactor block layer to improve support for stacked devices.

2007-07-30 Thread NeilBrown

The following 35(!) patches achieve a refactoring of some parts of the
block layer to provide better support for stacked devices.

The core issue is that of letting bio_add_page know the limitation
that the device imposes so that it doesn't create a bio that is too large.

For a unstacked disk device (e.g. scsi), bio_add_page can access
max_nr_sectors and max_nr_segments and some other details to know how
segments should be counted, and does the appropriate checks (this is a
simplification, get is close enough for this discussion).

For stacked devices (dm, md etc) bio_add_page can also call into the
driver via merge_bvec_fn to find out if a page can be added to a bio.

This mostly works for a simple stack (e.g. md on scsi) but breaks down
with more complicated stacks (dm on md on scsi) as the recusive calls
to merge_bvec_fn that are required are difficult to get right, and
don't provide any guarantees in the face of array reconfiguration anyway.
dm and md both take the approach of if the never level down defines
merge_bvec_fn, then set max_sectors to PAGE_SIZE/512 and live with small
requests.

So this patchset introduces a new approach.  bio_add_page is allowed
to create bios as big as it likes, and each layer is responsible for
splitting that bio up as required.

For intermediate levels like raid0, a number of new bios might be
created which refer to parts of the original, including parts of the
bi_io_vec.

For the bottom level driver (__make_request), each struct request
can refer to just part of a bio, so a bio can be effectively split
among several requests (a request can still reference multiple small
bios, and can concievable list parts of large bios and some small bios
as well, though the merging required to achieve this isn't implemented
yet - that patch set is big enough as it is).

This requires that the bi_io_vec become immutable, and that certain
parts of the bio become immutable.

To achieve this, we introduce fields into the bio so that it can point
to just part of the bi_io_vec (an offset and a size) and introduce
similar fields into 'struct request' to refer to only part of a bio list.

I am keen to receive both review and testing.  I have tested it on
SATA drives with a range of md configurations, but haven't tested dm,
or ide-floppy, or various other bits that needed to be changed.

Probably the changes that are mostly likely to raise eyebrows involve
the code to iterate over the segments in a bio or in a 'struct
request', so I'll give a bit more detail about them here.

Previously these (bio_for_each_segment, rq_for_each_bio) were simple
macros that provided pointers into bi_io_vec.  

As the actual segments that a request might need to handle may no
longer be explicitly in bi_io_vec (e.g. an offset might need to be
added, or a size restriction might need to be imposed) this is no
longer possible.  Instead, these functions (now rq_for_each_segment
and bio_for_each_segment) fill in a 'struct bio_vec' with appropriate
values.  e.g.
  struct bio_vec bvec;
  struct bio_iterator i;
  bio_for_each_segment(bvec, bio, i)
use bvec.bv_page, bvec.bi_offset, bvec.bv_len

This might seem like data is being copied around a bit more, but it
should all be in L1 cache and could conceivable be optimised into
registers by the compiler, so I don't believe this is a big problem
(no, I haven't figured a good way to test it).

To achieve this, the for_each macros are now somewhat more complex.
For example, rq_for_each_segment is:

#define bio_for_each_segment_offset(bv, bio, _i, offs, _size)   \
for (_i.i = 0, _i.offset = (bio)-bi_offset + offs, \
 _i.size = min_t(int, _size, (bio)-bi_size - offs);\
 _i.i  (bio)-bi_vcnt  _i.size  0;  \
 _i.i++)\
if (bv = *bio_iovec_idx((bio), _i.i),   \
bv.bv_offset += _i.offset,  \
bv.bv_len = _i.offset  \
? (_i.offset -= bv.bv_len, 0)   \
: (bv.bv_len -= _i.offset,  \
   _i.offset = 0,   \
   bv.bv_len  _i.size  \
   ? (_i.size -= bv.bv_len, 1)  \
   : (bv.bv_len = _i.size,  \
  _i.size = 0,  \
  bv.bv_len  0)))

#define bio_for_each_segment(bv, bio, __i)  \
bio_for_each_segment_offset(bv, bio, __i, 0, (bio)-bi_size)

It does some with some explanatory text in a comment, but it is still
a bit daunting.  Any suggestions on making this more approachable
would be very welcome.

Rather than 'cc' this to various lists or