Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-21 Thread Konrad Rzeszutek Wilk
On Fri, Mar 08, 2013 at 06:07:08PM +0100, Roger Pau Monné wrote:
> On 05/03/13 22:46, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
> >> On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
> >>> On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
>  Indirect descriptors introduce a new block operation
>  (BLKIF_OP_INDIRECT) that passes grant references instead of segments
>  in the request. This grant references are filled with arrays of
>  blkif_request_segment_aligned, this way we can send more segments in a
>  request.
> 
>  The proposed implementation sets the maximum number of indirect grefs
>  (frames filled with blkif_request_segment_aligned) to 256 in the
>  backend and 64 in the frontend. The value in the frontend has been
>  chosen experimentally, and the backend value has been set to a sane
>  value that allows expanding the maximum number of indirect descriptors
>  in the frontend if needed.
> >>>
> >>> So we are still using a similar format of the form:
> >>>
> >>> , etc.
> >>>
> >>> Why not utilize a layout that fits with the bio sg? That way
> >>> we might not even have to do the bio_alloc call and instead can
> >>> setup an bio (and bio-list) with the appropiate offsets/list?
> 
> I think we can already do this without changing the structure of the
> segments, we could just allocate a bio big enough to hold all the
> segments and queue them up (provided that the underlying storage device
> supports bios of this size).
> 
> bio = bio_alloc(GFP_KERNEL, nseg);
> if (unlikely(bio == NULL))
>   goto fail_put_bio;
> biolist[nbio++] = bio;
> bio->bi_bdev= preq.bdev;
> bio->bi_private = pending_req;
> bio->bi_end_io  = end_block_io_op;
> bio->bi_sector  = preq.sector_number;
> 
> for (i = 0; i < nseg; i++) {
>   rc = bio_add_page(bio, pages[i], seg[i].nsec << 9,
>   seg[i].buf & ~PAGE_MASK);
>   if (rc == 0)
>   goto fail_put_bio;
> }
> 
> This seems to work with Linux blkfront/blkback, and I guess biolist in
> blkback only has one bio all the time.

> 
> >>> Meaning that the format of the indirect descriptors is:
> >>>
> >>> 
> 
> Don't we need a length parameter? Also, next_index will be current+1,
> because we already send the segments sorted (using for_each_sg) in blkfront.
> 
> >>>
> >>> We already know what the first_sec and last_sect are - they
> >>> are basically: sector_number +  nr_segments * (whatever the sector size 
> >>> is) + offset
> >>
> >> This will of course be suitable for Linux, but what about other OSes, I
> >> know they support the traditional first_sec, last_sect (because it's
> >> already implemented), but I don't know how much work will it be for them
> >> to adopt this. If we have to do such a change I will have to check first
> >> that other frontend/backend can handle this easily also, I wouldn't like
> >> to simplify this for Linux by making it more difficult to implement in
> >> other OSes...
> > 
> > I would think that most OSes use the same framework. The ones that
> > are of notable interest are the Windows and BSD. Lets CC James here
> 
> Maybe I'm missing something here, but I don't see a really big benefit
> of using this new structure for segments instead of the current one.

The DIF/DIX requires that the bio layout going in blkfront and then
emerging on the other side in the SAS/SCSI/SATA drivers must be the same.

That means when you have a bio-vec, for example, where there are
five pages linked - the first four have 512 bytes of data (say in the middle
of the page - so 2048 -> 2560 are occupied, the rest is not). The total
is 2048 bytes, and the last page contains 32 bytes (four CRC checksums, each
8 bytes).

If we coalesce any of the five pages in one, then we need to (when we
take the request out of the ring) in the backend, to reconstruct these
five pages. 

My thought was that with the fsect, lsect as they exist now, we will be 
tempted to just colesce four sectors in a page and just make lsect = fsect + 4.

That however is _not_ what we are doing now - I think. We look to recreate
the layout exactly as the READ/WRITE requests are set to xen-blkfront.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-21 Thread Konrad Rzeszutek Wilk
On Fri, Mar 08, 2013 at 06:07:08PM +0100, Roger Pau Monné wrote:
 On 05/03/13 22:46, Konrad Rzeszutek Wilk wrote:
  On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
  On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
  On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
  Indirect descriptors introduce a new block operation
  (BLKIF_OP_INDIRECT) that passes grant references instead of segments
  in the request. This grant references are filled with arrays of
  blkif_request_segment_aligned, this way we can send more segments in a
  request.
 
  The proposed implementation sets the maximum number of indirect grefs
  (frames filled with blkif_request_segment_aligned) to 256 in the
  backend and 64 in the frontend. The value in the frontend has been
  chosen experimentally, and the backend value has been set to a sane
  value that allows expanding the maximum number of indirect descriptors
  in the frontend if needed.
 
  So we are still using a similar format of the form:
 
  gref, first_sec, last_sect, pad, etc.
 
  Why not utilize a layout that fits with the bio sg? That way
  we might not even have to do the bio_alloc call and instead can
  setup an bio (and bio-list) with the appropiate offsets/list?
 
 I think we can already do this without changing the structure of the
 segments, we could just allocate a bio big enough to hold all the
 segments and queue them up (provided that the underlying storage device
 supports bios of this size).
 
 bio = bio_alloc(GFP_KERNEL, nseg);
 if (unlikely(bio == NULL))
   goto fail_put_bio;
 biolist[nbio++] = bio;
 bio-bi_bdev= preq.bdev;
 bio-bi_private = pending_req;
 bio-bi_end_io  = end_block_io_op;
 bio-bi_sector  = preq.sector_number;
 
 for (i = 0; i  nseg; i++) {
   rc = bio_add_page(bio, pages[i], seg[i].nsec  9,
   seg[i].buf  ~PAGE_MASK);
   if (rc == 0)
   goto fail_put_bio;
 }
 
 This seems to work with Linux blkfront/blkback, and I guess biolist in
 blkback only has one bio all the time.

 
  Meaning that the format of the indirect descriptors is:
 
  gref, offset, next_index, pad
 
 Don't we need a length parameter? Also, next_index will be current+1,
 because we already send the segments sorted (using for_each_sg) in blkfront.
 
 
  We already know what the first_sec and last_sect are - they
  are basically: sector_number +  nr_segments * (whatever the sector size 
  is) + offset
 
  This will of course be suitable for Linux, but what about other OSes, I
  know they support the traditional first_sec, last_sect (because it's
  already implemented), but I don't know how much work will it be for them
  to adopt this. If we have to do such a change I will have to check first
  that other frontend/backend can handle this easily also, I wouldn't like
  to simplify this for Linux by making it more difficult to implement in
  other OSes...
  
  I would think that most OSes use the same framework. The ones that
  are of notable interest are the Windows and BSD. Lets CC James here
 
 Maybe I'm missing something here, but I don't see a really big benefit
 of using this new structure for segments instead of the current one.

The DIF/DIX requires that the bio layout going in blkfront and then
emerging on the other side in the SAS/SCSI/SATA drivers must be the same.

That means when you have a bio-vec, for example, where there are
five pages linked - the first four have 512 bytes of data (say in the middle
of the page - so 2048 - 2560 are occupied, the rest is not). The total
is 2048 bytes, and the last page contains 32 bytes (four CRC checksums, each
8 bytes).

If we coalesce any of the five pages in one, then we need to (when we
take the request out of the ring) in the backend, to reconstruct these
five pages. 

My thought was that with the fsect, lsect as they exist now, we will be 
tempted to just colesce four sectors in a page and just make lsect = fsect + 4.

That however is _not_ what we are doing now - I think. We look to recreate
the layout exactly as the READ/WRITE requests are set to xen-blkfront.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-19 Thread Konrad Rzeszutek Wilk
On Mon, Mar 18, 2013 at 06:06:38PM +0100, Roger Pau Monné wrote:
> On 28/02/13 11:28, Roger Pau Monne wrote:
> > Indirect descriptors introduce a new block operation
> > (BLKIF_OP_INDIRECT) that passes grant references instead of segments
> > in the request. This grant references are filled with arrays of
> > blkif_request_segment_aligned, this way we can send more segments in a
> > request.
> > 
> > The proposed implementation sets the maximum number of indirect grefs
> > (frames filled with blkif_request_segment_aligned) to 256 in the
> > backend and 64 in the frontend. The value in the frontend has been
> > chosen experimentally, and the backend value has been set to a sane
> > value that allows expanding the maximum number of indirect descriptors
> > in the frontend if needed.
> 
> I've added some additional debugging messages in blkfront, and found out
> that the queue in blkfront is not providing request bigger than 64
> segments for read requests, or 128 segments for write requests, although
> I set:
> 
> blk_queue_max_segments(info->rq, 256);
> 
> Is there any other limit I'm missing on the number of segments per
> request a queue can provide?

Martin, any ideas?
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-19 Thread Konrad Rzeszutek Wilk
On Mon, Mar 18, 2013 at 06:06:38PM +0100, Roger Pau Monné wrote:
 On 28/02/13 11:28, Roger Pau Monne wrote:
  Indirect descriptors introduce a new block operation
  (BLKIF_OP_INDIRECT) that passes grant references instead of segments
  in the request. This grant references are filled with arrays of
  blkif_request_segment_aligned, this way we can send more segments in a
  request.
  
  The proposed implementation sets the maximum number of indirect grefs
  (frames filled with blkif_request_segment_aligned) to 256 in the
  backend and 64 in the frontend. The value in the frontend has been
  chosen experimentally, and the backend value has been set to a sane
  value that allows expanding the maximum number of indirect descriptors
  in the frontend if needed.
 
 I've added some additional debugging messages in blkfront, and found out
 that the queue in blkfront is not providing request bigger than 64
 segments for read requests, or 128 segments for write requests, although
 I set:
 
 blk_queue_max_segments(info-rq, 256);
 
 Is there any other limit I'm missing on the number of segments per
 request a queue can provide?

Martin, any ideas?
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-18 Thread Roger Pau Monné
On 28/02/13 11:28, Roger Pau Monne wrote:
> Indirect descriptors introduce a new block operation
> (BLKIF_OP_INDIRECT) that passes grant references instead of segments
> in the request. This grant references are filled with arrays of
> blkif_request_segment_aligned, this way we can send more segments in a
> request.
> 
> The proposed implementation sets the maximum number of indirect grefs
> (frames filled with blkif_request_segment_aligned) to 256 in the
> backend and 64 in the frontend. The value in the frontend has been
> chosen experimentally, and the backend value has been set to a sane
> value that allows expanding the maximum number of indirect descriptors
> in the frontend if needed.

I've added some additional debugging messages in blkfront, and found out
that the queue in blkfront is not providing request bigger than 64
segments for read requests, or 128 segments for write requests, although
I set:

blk_queue_max_segments(info->rq, 256);

Is there any other limit I'm missing on the number of segments per
request a queue can provide?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-18 Thread Roger Pau Monné
On 28/02/13 11:28, Roger Pau Monne wrote:
 Indirect descriptors introduce a new block operation
 (BLKIF_OP_INDIRECT) that passes grant references instead of segments
 in the request. This grant references are filled with arrays of
 blkif_request_segment_aligned, this way we can send more segments in a
 request.
 
 The proposed implementation sets the maximum number of indirect grefs
 (frames filled with blkif_request_segment_aligned) to 256 in the
 backend and 64 in the frontend. The value in the frontend has been
 chosen experimentally, and the backend value has been set to a sane
 value that allows expanding the maximum number of indirect descriptors
 in the frontend if needed.

I've added some additional debugging messages in blkfront, and found out
that the queue in blkfront is not providing request bigger than 64
segments for read requests, or 128 segments for write requests, although
I set:

blk_queue_max_segments(info-rq, 256);

Is there any other limit I'm missing on the number of segments per
request a queue can provide?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-08 Thread Roger Pau Monné
On 05/03/13 22:46, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
>> On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
 Indirect descriptors introduce a new block operation
 (BLKIF_OP_INDIRECT) that passes grant references instead of segments
 in the request. This grant references are filled with arrays of
 blkif_request_segment_aligned, this way we can send more segments in a
 request.

 The proposed implementation sets the maximum number of indirect grefs
 (frames filled with blkif_request_segment_aligned) to 256 in the
 backend and 64 in the frontend. The value in the frontend has been
 chosen experimentally, and the backend value has been set to a sane
 value that allows expanding the maximum number of indirect descriptors
 in the frontend if needed.
>>>
>>> So we are still using a similar format of the form:
>>>
>>> , etc.
>>>
>>> Why not utilize a layout that fits with the bio sg? That way
>>> we might not even have to do the bio_alloc call and instead can
>>> setup an bio (and bio-list) with the appropiate offsets/list?

I think we can already do this without changing the structure of the
segments, we could just allocate a bio big enough to hold all the
segments and queue them up (provided that the underlying storage device
supports bios of this size).

bio = bio_alloc(GFP_KERNEL, nseg);
if (unlikely(bio == NULL))
goto fail_put_bio;
biolist[nbio++] = bio;
bio->bi_bdev= preq.bdev;
bio->bi_private = pending_req;
bio->bi_end_io  = end_block_io_op;
bio->bi_sector  = preq.sector_number;

for (i = 0; i < nseg; i++) {
rc = bio_add_page(bio, pages[i], seg[i].nsec << 9,
seg[i].buf & ~PAGE_MASK);
if (rc == 0)
goto fail_put_bio;
}

This seems to work with Linux blkfront/blkback, and I guess biolist in
blkback only has one bio all the time.

>>> Meaning that the format of the indirect descriptors is:
>>>
>>> 

Don't we need a length parameter? Also, next_index will be current+1,
because we already send the segments sorted (using for_each_sg) in blkfront.

>>>
>>> We already know what the first_sec and last_sect are - they
>>> are basically: sector_number +  nr_segments * (whatever the sector size is) 
>>> + offset
>>
>> This will of course be suitable for Linux, but what about other OSes, I
>> know they support the traditional first_sec, last_sect (because it's
>> already implemented), but I don't know how much work will it be for them
>> to adopt this. If we have to do such a change I will have to check first
>> that other frontend/backend can handle this easily also, I wouldn't like
>> to simplify this for Linux by making it more difficult to implement in
>> other OSes...
> 
> I would think that most OSes use the same framework. The ones that
> are of notable interest are the Windows and BSD. Lets CC James here

Maybe I'm missing something here, but I don't see a really big benefit
of using this new structure for segments instead of the current one.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-08 Thread Roger Pau Monné
On 05/03/13 22:46, Konrad Rzeszutek Wilk wrote:
 On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
 On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
 On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
 Indirect descriptors introduce a new block operation
 (BLKIF_OP_INDIRECT) that passes grant references instead of segments
 in the request. This grant references are filled with arrays of
 blkif_request_segment_aligned, this way we can send more segments in a
 request.

 The proposed implementation sets the maximum number of indirect grefs
 (frames filled with blkif_request_segment_aligned) to 256 in the
 backend and 64 in the frontend. The value in the frontend has been
 chosen experimentally, and the backend value has been set to a sane
 value that allows expanding the maximum number of indirect descriptors
 in the frontend if needed.

 So we are still using a similar format of the form:

 gref, first_sec, last_sect, pad, etc.

 Why not utilize a layout that fits with the bio sg? That way
 we might not even have to do the bio_alloc call and instead can
 setup an bio (and bio-list) with the appropiate offsets/list?

I think we can already do this without changing the structure of the
segments, we could just allocate a bio big enough to hold all the
segments and queue them up (provided that the underlying storage device
supports bios of this size).

bio = bio_alloc(GFP_KERNEL, nseg);
if (unlikely(bio == NULL))
goto fail_put_bio;
biolist[nbio++] = bio;
bio-bi_bdev= preq.bdev;
bio-bi_private = pending_req;
bio-bi_end_io  = end_block_io_op;
bio-bi_sector  = preq.sector_number;

for (i = 0; i  nseg; i++) {
rc = bio_add_page(bio, pages[i], seg[i].nsec  9,
seg[i].buf  ~PAGE_MASK);
if (rc == 0)
goto fail_put_bio;
}

This seems to work with Linux blkfront/blkback, and I guess biolist in
blkback only has one bio all the time.

 Meaning that the format of the indirect descriptors is:

 gref, offset, next_index, pad

Don't we need a length parameter? Also, next_index will be current+1,
because we already send the segments sorted (using for_each_sg) in blkfront.


 We already know what the first_sec and last_sect are - they
 are basically: sector_number +  nr_segments * (whatever the sector size is) 
 + offset

 This will of course be suitable for Linux, but what about other OSes, I
 know they support the traditional first_sec, last_sect (because it's
 already implemented), but I don't know how much work will it be for them
 to adopt this. If we have to do such a change I will have to check first
 that other frontend/backend can handle this easily also, I wouldn't like
 to simplify this for Linux by making it more difficult to implement in
 other OSes...
 
 I would think that most OSes use the same framework. The ones that
 are of notable interest are the Windows and BSD. Lets CC James here

Maybe I'm missing something here, but I don't see a really big benefit
of using this new structure for segments instead of the current one.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
> On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
> > On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
> >> Indirect descriptors introduce a new block operation
> >> (BLKIF_OP_INDIRECT) that passes grant references instead of segments
> >> in the request. This grant references are filled with arrays of
> >> blkif_request_segment_aligned, this way we can send more segments in a
> >> request.
> >>
> >> The proposed implementation sets the maximum number of indirect grefs
> >> (frames filled with blkif_request_segment_aligned) to 256 in the
> >> backend and 64 in the frontend. The value in the frontend has been
> >> chosen experimentally, and the backend value has been set to a sane
> >> value that allows expanding the maximum number of indirect descriptors
> >> in the frontend if needed.
> > 
> > So we are still using a similar format of the form:
> > 
> > , etc.
> > 
> > Why not utilize a layout that fits with the bio sg? That way
> > we might not even have to do the bio_alloc call and instead can
> > setup an bio (and bio-list) with the appropiate offsets/list?
> > 
> > Meaning that the format of the indirect descriptors is:
> > 
> > 
> > 
> > We already know what the first_sec and last_sect are - they
> > are basically: sector_number +  nr_segments * (whatever the sector size is) 
> > + offset
> 
> This will of course be suitable for Linux, but what about other OSes, I
> know they support the traditional first_sec, last_sect (because it's
> already implemented), but I don't know how much work will it be for them
> to adopt this. If we have to do such a change I will have to check first
> that other frontend/backend can handle this easily also, I wouldn't like
> to simplify this for Linux by making it more difficult to implement in
> other OSes...

I would think that most OSes use the same framework. The ones that
are of notable interest are the Windows and BSD. Lets CC James here

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 06:00:51PM +0100, Roger Pau Monné wrote:
> On 05/03/13 15:16, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
> > On 04.03.13 at 21:44, Konrad Rzeszutek Wilk  
> > wrote:
> >>>  'op' sounds good. With a comment saying it can do all of the 
> >>> BLKIF_OPS_..
> >>> except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
> >>> it that way for fun.
> >>
> >> In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
> >> The former should - if useful for anything - be controlled by a
> >> separate feature flag, and the latter is plain pointless to indirect.
> >> And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
> >> and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
> >> indirection is only permitted for normal I/O (read/write) ops.
> > 
> >  That makes sense. And also of course the new BLKIF_OP should
> > be documented in the Xen tree as well.
> 
> The only ops that can be done indirectly are _READ, _WRITE and
> _BARRIER/_FLUSH. From the implementation in blkfront it seems like
> _FLUSH/_BARRIER requests can indeed contain segments, but I haven't been
> able to spot any _FLUSH op with segments on it. Can you confirm FLUSH
> requests never contain bios?

Not FLUSH per say. But the FUA should be able to provide data and the
flush semantics with it. Except we don't support FUA so this is irrelevant
 - unless in the future we want to intrduce FUA or WRITE with some extra
flags.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Roger Pau Monné
On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
> On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
>> Indirect descriptors introduce a new block operation
>> (BLKIF_OP_INDIRECT) that passes grant references instead of segments
>> in the request. This grant references are filled with arrays of
>> blkif_request_segment_aligned, this way we can send more segments in a
>> request.
>>
>> The proposed implementation sets the maximum number of indirect grefs
>> (frames filled with blkif_request_segment_aligned) to 256 in the
>> backend and 64 in the frontend. The value in the frontend has been
>> chosen experimentally, and the backend value has been set to a sane
>> value that allows expanding the maximum number of indirect descriptors
>> in the frontend if needed.
> 
> So we are still using a similar format of the form:
> 
> , etc.
> 
> Why not utilize a layout that fits with the bio sg? That way
> we might not even have to do the bio_alloc call and instead can
> setup an bio (and bio-list) with the appropiate offsets/list?
> 
> Meaning that the format of the indirect descriptors is:
> 
> 
> 
> We already know what the first_sec and last_sect are - they
> are basically: sector_number +  nr_segments * (whatever the sector size is) + 
> offset

This will of course be suitable for Linux, but what about other OSes, I
know they support the traditional first_sec, last_sect (because it's
already implemented), but I don't know how much work will it be for them
to adopt this. If we have to do such a change I will have to check first
that other frontend/backend can handle this easily also, I wouldn't like
to simplify this for Linux by making it more difficult to implement in
other OSes...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Roger Pau Monné
On 05/03/13 15:16, Konrad Rzeszutek Wilk wrote:
> On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
> On 04.03.13 at 21:44, Konrad Rzeszutek Wilk  
> wrote:
>>>  'op' sounds good. With a comment saying it can do all of the 
>>> BLKIF_OPS_..
>>> except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
>>> it that way for fun.
>>
>> In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
>> The former should - if useful for anything - be controlled by a
>> separate feature flag, and the latter is plain pointless to indirect.
>> And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
>> and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
>> indirection is only permitted for normal I/O (read/write) ops.
> 
>  That makes sense. And also of course the new BLKIF_OP should
> be documented in the Xen tree as well.

The only ops that can be done indirectly are _READ, _WRITE and
_BARRIER/_FLUSH. From the implementation in blkfront it seems like
_FLUSH/_BARRIER requests can indeed contain segments, but I haven't been
able to spot any _FLUSH op with segments on it. Can you confirm FLUSH
requests never contain bios?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
> >>> On 04.03.13 at 21:44, Konrad Rzeszutek Wilk  
> >>> wrote:
> >  'op' sounds good. With a comment saying it can do all of the 
> > BLKIF_OPS_..
> > except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
> > it that way for fun.
> 
> In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
> The former should - if useful for anything - be controlled by a
> separate feature flag, and the latter is plain pointless to indirect.
> And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
> and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
> indirection is only permitted for normal I/O (read/write) ops.

 That makes sense. And also of course the new BLKIF_OP should
be documented in the Xen tree as well.

> 
> Jan
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Jan Beulich
>>> On 04.03.13 at 21:44, Konrad Rzeszutek Wilk  wrote:
>  'op' sounds good. With a comment saying it can do all of the 
> BLKIF_OPS_..
> except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
> it that way for fun.

In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
The former should - if useful for anything - be controlled by a
separate feature flag, and the latter is plain pointless to indirect.
And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
indirection is only permitted for normal I/O (read/write) ops.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Jan Beulich
 On 04.03.13 at 21:44, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote:
 nods 'op' sounds good. With a comment saying it can do all of the 
 BLKIF_OPS_..
 except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
 it that way for fun.

In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
The former should - if useful for anything - be controlled by a
separate feature flag, and the latter is plain pointless to indirect.
And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
indirection is only permitted for normal I/O (read/write) ops.

Jan

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
  On 04.03.13 at 21:44, Konrad Rzeszutek Wilk konrad.w...@oracle.com 
  wrote:
  nods 'op' sounds good. With a comment saying it can do all of the 
  BLKIF_OPS_..
  except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
  it that way for fun.
 
 In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
 The former should - if useful for anything - be controlled by a
 separate feature flag, and the latter is plain pointless to indirect.
 And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
 and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
 indirection is only permitted for normal I/O (read/write) ops.

nods That makes sense. And also of course the new BLKIF_OP should
be documented in the Xen tree as well.

 
 Jan
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Roger Pau Monné
On 05/03/13 15:16, Konrad Rzeszutek Wilk wrote:
 On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
 On 04.03.13 at 21:44, Konrad Rzeszutek Wilk konrad.w...@oracle.com 
 wrote:
 nods 'op' sounds good. With a comment saying it can do all of the 
 BLKIF_OPS_..
 except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
 it that way for fun.

 In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
 The former should - if useful for anything - be controlled by a
 separate feature flag, and the latter is plain pointless to indirect.
 And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
 and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
 indirection is only permitted for normal I/O (read/write) ops.
 
 nods That makes sense. And also of course the new BLKIF_OP should
 be documented in the Xen tree as well.

The only ops that can be done indirectly are _READ, _WRITE and
_BARRIER/_FLUSH. From the implementation in blkfront it seems like
_FLUSH/_BARRIER requests can indeed contain segments, but I haven't been
able to spot any _FLUSH op with segments on it. Can you confirm FLUSH
requests never contain bios?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Roger Pau Monné
On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
 On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
 Indirect descriptors introduce a new block operation
 (BLKIF_OP_INDIRECT) that passes grant references instead of segments
 in the request. This grant references are filled with arrays of
 blkif_request_segment_aligned, this way we can send more segments in a
 request.

 The proposed implementation sets the maximum number of indirect grefs
 (frames filled with blkif_request_segment_aligned) to 256 in the
 backend and 64 in the frontend. The value in the frontend has been
 chosen experimentally, and the backend value has been set to a sane
 value that allows expanding the maximum number of indirect descriptors
 in the frontend if needed.
 
 So we are still using a similar format of the form:
 
 gref, first_sec, last_sect, pad, etc.
 
 Why not utilize a layout that fits with the bio sg? That way
 we might not even have to do the bio_alloc call and instead can
 setup an bio (and bio-list) with the appropiate offsets/list?
 
 Meaning that the format of the indirect descriptors is:
 
 gref, offset, next_index, pad
 
 We already know what the first_sec and last_sect are - they
 are basically: sector_number +  nr_segments * (whatever the sector size is) + 
 offset

This will of course be suitable for Linux, but what about other OSes, I
know they support the traditional first_sec, last_sect (because it's
already implemented), but I don't know how much work will it be for them
to adopt this. If we have to do such a change I will have to check first
that other frontend/backend can handle this easily also, I wouldn't like
to simplify this for Linux by making it more difficult to implement in
other OSes...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 06:00:51PM +0100, Roger Pau Monné wrote:
 On 05/03/13 15:16, Konrad Rzeszutek Wilk wrote:
  On Tue, Mar 05, 2013 at 08:11:19AM +, Jan Beulich wrote:
  On 04.03.13 at 21:44, Konrad Rzeszutek Wilk konrad.w...@oracle.com 
  wrote:
  nods 'op' sounds good. With a comment saying it can do all of the 
  BLKIF_OPS_..
  except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
  it that way for fun.
 
  In fact I'd like to exclude chaining as well as BLKIF_OP_DISCARD here.
  The former should - if useful for anything - be controlled by a
  separate feature flag, and the latter is plain pointless to indirect.
  And I reckon the same would apply to BLKIF_OP_FLUSH_DISKCACHE
  and BLKIF_OP_RESERVED_1 - i.e. it might be better to state that
  indirection is only permitted for normal I/O (read/write) ops.
  
  nods That makes sense. And also of course the new BLKIF_OP should
  be documented in the Xen tree as well.
 
 The only ops that can be done indirectly are _READ, _WRITE and
 _BARRIER/_FLUSH. From the implementation in blkfront it seems like
 _FLUSH/_BARRIER requests can indeed contain segments, but I haven't been
 able to spot any _FLUSH op with segments on it. Can you confirm FLUSH
 requests never contain bios?

Not FLUSH per say. But the FUA should be able to provide data and the
flush semantics with it. Except we don't support FUA so this is irrelevant
 - unless in the future we want to intrduce FUA or WRITE with some extra
flags.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-05 Thread Konrad Rzeszutek Wilk
On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
 On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
  On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
  Indirect descriptors introduce a new block operation
  (BLKIF_OP_INDIRECT) that passes grant references instead of segments
  in the request. This grant references are filled with arrays of
  blkif_request_segment_aligned, this way we can send more segments in a
  request.
 
  The proposed implementation sets the maximum number of indirect grefs
  (frames filled with blkif_request_segment_aligned) to 256 in the
  backend and 64 in the frontend. The value in the frontend has been
  chosen experimentally, and the backend value has been set to a sane
  value that allows expanding the maximum number of indirect descriptors
  in the frontend if needed.
  
  So we are still using a similar format of the form:
  
  gref, first_sec, last_sect, pad, etc.
  
  Why not utilize a layout that fits with the bio sg? That way
  we might not even have to do the bio_alloc call and instead can
  setup an bio (and bio-list) with the appropiate offsets/list?
  
  Meaning that the format of the indirect descriptors is:
  
  gref, offset, next_index, pad
  
  We already know what the first_sec and last_sect are - they
  are basically: sector_number +  nr_segments * (whatever the sector size is) 
  + offset
 
 This will of course be suitable for Linux, but what about other OSes, I
 know they support the traditional first_sec, last_sect (because it's
 already implemented), but I don't know how much work will it be for them
 to adopt this. If we have to do such a change I will have to check first
 that other frontend/backend can handle this easily also, I wouldn't like
 to simplify this for Linux by making it more difficult to implement in
 other OSes...

I would think that most OSes use the same framework. The ones that
are of notable interest are the Windows and BSD. Lets CC James here

 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 28, 2013 at 01:28:48PM +, Jan Beulich wrote:
> >>> On 28.02.13 at 13:00, Roger Pau Monné wrote:
> > On 28/02/13 12:19, Jan Beulich wrote:
> > On 28.02.13 at 11:28, Roger Pau Monne  wrote:
> >>> @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
> >>>   */
> >>>  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
> >>>  
> >>> +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
> >>> +
> >>> +struct blkif_request_segment_aligned {
> >>> + grant_ref_t gref;/* reference to I/O buffer frame*/
> >>> + /* @first_sect: first sector in frame to transfer (inclusive).   */
> >>> + /* @last_sect: last sector in frame to transfer (inclusive). */
> >>> + uint8_t first_sect, last_sect;
> >>> + uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
> >>> */
> >>> +} __attribute__((__packed__));
> >> 
> >> What's the __packed__ for here?
> > 
> > Yes, that's not needed.
> > 
> >> 
> >>> +
> >>>  struct blkif_request_rw {
> >>>   uint8_tnr_segments;  /* number of segments   */
> >>>   blkif_vdev_t   handle;   /* only for read/write requests */
> >>> @@ -138,11 +150,24 @@ struct blkif_request_discard {
> >>>   uint8_t_pad3;
> >>>  } __attribute__((__packed__));
> >>>  
> >>> +struct blkif_request_indirect {
> >>> + uint8_tindirect_op;
> >>> + uint16_t   nr_segments;
> >>> +#ifdef CONFIG_X86_64
> >>> + uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
> >>> */
> >>> +#endif
> >> 
> >> Either you want the structure be packed tightly (and you don't care
> >> about misaligned fields), in which case you shouldn't need a padding
> >> field. That's even more so as there's no padding between indirect_op
> >> and nr_segments, so everything is misaligned anyway, and the
> >> comment above is wrong too (offsetof() really ought to yield 7 in
> >> that case).
> > 
> > This padding is because we want to have the "id" field at the same
> > position as blkif_request_rw, so we need to add the padding for it to
> > match 32 & 64 bit blkif_request_rw structures, this prevents adding some
> > "if (req.op == BLKIF_OP_INDIRECT)..." if we only need to get the id of
> > the request.
> 
> Oh, right, that's desirable of course.
> 
> > The comment is indeed wrong, I've copied it from blkif_request_discard
> > and forgot to change the offset
> 
> But the offset stated there then is right after all - I forgot that
> there is a 1-byte field outside the union (the way this is being done
> in the upstream Linux header is really ugly imo, but I guess Jeremy
> and/or Konrad liked it that way). That's also why the packed
> attribute is needed here.

I am not particularly found as I keep on forgetting about the 1-byte field
as well. If you have a patch to clean it up would love to see it.

> 
> But you will probably want to switch sector_number and handle, so
> that sector_number becomes aligned, and add another 16-bit
> padding field between handle and indirect_grefs[].
> 
> I also wonder whether "indirect_op" wouldn't better be named
> "actual_op" or just "op".

 'op' sounds good. With a comment saying it can do all of the BLKIF_OPS_..
except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
it that way for fun.

> 
> Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
> Indirect descriptors introduce a new block operation
> (BLKIF_OP_INDIRECT) that passes grant references instead of segments
> in the request. This grant references are filled with arrays of
> blkif_request_segment_aligned, this way we can send more segments in a
> request.
> 
> The proposed implementation sets the maximum number of indirect grefs
> (frames filled with blkif_request_segment_aligned) to 256 in the
> backend and 64 in the frontend. The value in the frontend has been
> chosen experimentally, and the backend value has been set to a sane
> value that allows expanding the maximum number of indirect descriptors
> in the frontend if needed.

So we are still using a similar format of the form:

, etc.

Why not utilize a layout that fits with the bio sg? That way
we might not even have to do the bio_alloc call and instead can
setup an bio (and bio-list) with the appropiate offsets/list?

Meaning that the format of the indirect descriptors is:



We already know what the first_sec and last_sect are - they
are basically: sector_number +  nr_segments * (whatever the sector size is) + 
offset



> 
> The migration code has changed from the previous implementation, in
> which we simply remapped the segments on the shared ring. Now the
> maximum number of segments allowed in a request can change depending
> on the backend, so we have to requeue all the requests in the ring and
> in the queue and split the bios in them if they are bigger than the
> new maximum number of segments.
> 
> Signed-off-by: Roger Pau Monné 
> Cc: Konrad Rzeszutek Wilk 
> Cc: xen-de...@lists.xen.org
> ---
>  drivers/block/xen-blkback/blkback.c |  129 +++---
>  drivers/block/xen-blkback/common.h  |   80 ++-
>  drivers/block/xen-blkback/xenbus.c  |8 +
>  drivers/block/xen-blkfront.c|  498 
> +--
>  include/xen/interface/io/blkif.h|   25 ++
>  5 files changed, 622 insertions(+), 118 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/blkback.c 
> b/drivers/block/xen-blkback/blkback.c
> index 0fa30db..98eb16b 100644
> --- a/drivers/block/xen-blkback/blkback.c
> +++ b/drivers/block/xen-blkback/blkback.c
> @@ -70,7 +70,7 @@ MODULE_PARM_DESC(reqs, "Number of blkback requests to 
> allocate per backend");
>   * algorithm.
>   */
>  
> -static int xen_blkif_max_pgrants = 352;
> +static int xen_blkif_max_pgrants = 1024;
>  module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
>  MODULE_PARM_DESC(max_persistent_grants,
>   "Maximum number of grants to map persistently");
> @@ -578,10 +578,6 @@ purge_gnt_list:
>   return 0;
>  }
>  
> -struct seg_buf {
> - unsigned long buf;
> - unsigned int nsec;
> -};
>  /*
>   * Unmap the grant references, and also remove the M2P over-rides
>   * used in the 'pending_req'.
> @@ -761,32 +757,79 @@ out_of_memory:
>   return -ENOMEM;
>  }
>  
> -static int xen_blkbk_map_seg(struct blkif_request *req,
> -  struct pending_req *pending_req,
> +static int xen_blkbk_map_seg(struct pending_req *pending_req,
>struct seg_buf seg[],
>struct page *pages[])
>  {
>   int i, rc;
> - grant_ref_t grefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
>  
> - for (i = 0; i < req->u.rw.nr_segments; i++)
> - grefs[i] = req->u.rw.seg[i].gref;
> -
> - rc = xen_blkbk_map(pending_req->blkif, grefs,
> + rc = xen_blkbk_map(pending_req->blkif, pending_req->grefs,
>  pending_req->persistent_gnts,
>  pending_req->grant_handles, pending_req->pages,
> -req->u.rw.nr_segments,
> +pending_req->nr_pages,
>  (pending_req->operation != BLKIF_OP_READ));
>   if (rc)
>   return rc;
>  
> - for (i = 0; i < req->u.rw.nr_segments; i++)
> - seg[i].buf = pfn_to_mfn(page_to_pfn(pending_req->pages[i]))
> -  << PAGE_SHIFT | (req->u.rw.seg[i].first_sect << 9);
> + for (i = 0; i < pending_req->nr_pages; i++)
> + seg[i].buf |= pfn_to_mfn(page_to_pfn(pending_req->pages[i]))
> +  << PAGE_SHIFT;
>  
>   return 0;
>  }
>  
> +static int xen_blkbk_parse_indirect(struct blkif_request *req,
> +struct pending_req *pending_req,
> +struct seg_buf seg[],
> +struct phys_req *preq)
> +{
> + struct persistent_gnt **persistent =
> + pending_req->indirect_persistent_gnts;
> + struct page **pages = pending_req->indirect_pages;
> + struct xen_blkif *blkif = pending_req->blkif;
> + int indirect_grefs, rc, n, nseg, i;
> + struct blkif_request_segment_aligned *segments = NULL;
> +
> + nseg = pending_req->nr_pages;
> + indirect_grefs = (nseg + 

Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
 Indirect descriptors introduce a new block operation
 (BLKIF_OP_INDIRECT) that passes grant references instead of segments
 in the request. This grant references are filled with arrays of
 blkif_request_segment_aligned, this way we can send more segments in a
 request.
 
 The proposed implementation sets the maximum number of indirect grefs
 (frames filled with blkif_request_segment_aligned) to 256 in the
 backend and 64 in the frontend. The value in the frontend has been
 chosen experimentally, and the backend value has been set to a sane
 value that allows expanding the maximum number of indirect descriptors
 in the frontend if needed.

So we are still using a similar format of the form:

gref, first_sec, last_sect, pad, etc.

Why not utilize a layout that fits with the bio sg? That way
we might not even have to do the bio_alloc call and instead can
setup an bio (and bio-list) with the appropiate offsets/list?

Meaning that the format of the indirect descriptors is:

gref, offset, next_index, pad

We already know what the first_sec and last_sect are - they
are basically: sector_number +  nr_segments * (whatever the sector size is) + 
offset



 
 The migration code has changed from the previous implementation, in
 which we simply remapped the segments on the shared ring. Now the
 maximum number of segments allowed in a request can change depending
 on the backend, so we have to requeue all the requests in the ring and
 in the queue and split the bios in them if they are bigger than the
 new maximum number of segments.
 
 Signed-off-by: Roger Pau Monné roger@citrix.com
 Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Cc: xen-de...@lists.xen.org
 ---
  drivers/block/xen-blkback/blkback.c |  129 +++---
  drivers/block/xen-blkback/common.h  |   80 ++-
  drivers/block/xen-blkback/xenbus.c  |8 +
  drivers/block/xen-blkfront.c|  498 
 +--
  include/xen/interface/io/blkif.h|   25 ++
  5 files changed, 622 insertions(+), 118 deletions(-)
 
 diff --git a/drivers/block/xen-blkback/blkback.c 
 b/drivers/block/xen-blkback/blkback.c
 index 0fa30db..98eb16b 100644
 --- a/drivers/block/xen-blkback/blkback.c
 +++ b/drivers/block/xen-blkback/blkback.c
 @@ -70,7 +70,7 @@ MODULE_PARM_DESC(reqs, Number of blkback requests to 
 allocate per backend);
   * algorithm.
   */
  
 -static int xen_blkif_max_pgrants = 352;
 +static int xen_blkif_max_pgrants = 1024;
  module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
  MODULE_PARM_DESC(max_persistent_grants,
   Maximum number of grants to map persistently);
 @@ -578,10 +578,6 @@ purge_gnt_list:
   return 0;
  }
  
 -struct seg_buf {
 - unsigned long buf;
 - unsigned int nsec;
 -};
  /*
   * Unmap the grant references, and also remove the M2P over-rides
   * used in the 'pending_req'.
 @@ -761,32 +757,79 @@ out_of_memory:
   return -ENOMEM;
  }
  
 -static int xen_blkbk_map_seg(struct blkif_request *req,
 -  struct pending_req *pending_req,
 +static int xen_blkbk_map_seg(struct pending_req *pending_req,
struct seg_buf seg[],
struct page *pages[])
  {
   int i, rc;
 - grant_ref_t grefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
  
 - for (i = 0; i  req-u.rw.nr_segments; i++)
 - grefs[i] = req-u.rw.seg[i].gref;
 -
 - rc = xen_blkbk_map(pending_req-blkif, grefs,
 + rc = xen_blkbk_map(pending_req-blkif, pending_req-grefs,
  pending_req-persistent_gnts,
  pending_req-grant_handles, pending_req-pages,
 -req-u.rw.nr_segments,
 +pending_req-nr_pages,
  (pending_req-operation != BLKIF_OP_READ));
   if (rc)
   return rc;
  
 - for (i = 0; i  req-u.rw.nr_segments; i++)
 - seg[i].buf = pfn_to_mfn(page_to_pfn(pending_req-pages[i]))
 -   PAGE_SHIFT | (req-u.rw.seg[i].first_sect  9);
 + for (i = 0; i  pending_req-nr_pages; i++)
 + seg[i].buf |= pfn_to_mfn(page_to_pfn(pending_req-pages[i]))
 +   PAGE_SHIFT;
  
   return 0;
  }
  
 +static int xen_blkbk_parse_indirect(struct blkif_request *req,
 +struct pending_req *pending_req,
 +struct seg_buf seg[],
 +struct phys_req *preq)
 +{
 + struct persistent_gnt **persistent =
 + pending_req-indirect_persistent_gnts;
 + struct page **pages = pending_req-indirect_pages;
 + struct xen_blkif *blkif = pending_req-blkif;
 + int indirect_grefs, rc, n, nseg, i;
 + struct blkif_request_segment_aligned *segments = NULL;
 +
 + nseg = pending_req-nr_pages;
 + indirect_grefs = (nseg + SEGS_PER_INDIRECT_FRAME - 1) /
 +

Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-03-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 28, 2013 at 01:28:48PM +, Jan Beulich wrote:
  On 28.02.13 at 13:00, Roger Pau Monnéroger@citrix.com wrote:
  On 28/02/13 12:19, Jan Beulich wrote:
  On 28.02.13 at 11:28, Roger Pau Monne roger@citrix.com wrote:
  @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
*/
   #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
   
  +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
  +
  +struct blkif_request_segment_aligned {
  + grant_ref_t gref;/* reference to I/O buffer frame*/
  + /* @first_sect: first sector in frame to transfer (inclusive).   */
  + /* @last_sect: last sector in frame to transfer (inclusive). */
  + uint8_t first_sect, last_sect;
  + uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
  */
  +} __attribute__((__packed__));
  
  What's the __packed__ for here?
  
  Yes, that's not needed.
  
  
  +
   struct blkif_request_rw {
uint8_tnr_segments;  /* number of segments   */
blkif_vdev_t   handle;   /* only for read/write requests */
  @@ -138,11 +150,24 @@ struct blkif_request_discard {
uint8_t_pad3;
   } __attribute__((__packed__));
   
  +struct blkif_request_indirect {
  + uint8_tindirect_op;
  + uint16_t   nr_segments;
  +#ifdef CONFIG_X86_64
  + uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
  */
  +#endif
  
  Either you want the structure be packed tightly (and you don't care
  about misaligned fields), in which case you shouldn't need a padding
  field. That's even more so as there's no padding between indirect_op
  and nr_segments, so everything is misaligned anyway, and the
  comment above is wrong too (offsetof() really ought to yield 7 in
  that case).
  
  This padding is because we want to have the id field at the same
  position as blkif_request_rw, so we need to add the padding for it to
  match 32  64 bit blkif_request_rw structures, this prevents adding some
  if (req.op == BLKIF_OP_INDIRECT)... if we only need to get the id of
  the request.
 
 Oh, right, that's desirable of course.
 
  The comment is indeed wrong, I've copied it from blkif_request_discard
  and forgot to change the offset
 
 But the offset stated there then is right after all - I forgot that
 there is a 1-byte field outside the union (the way this is being done
 in the upstream Linux header is really ugly imo, but I guess Jeremy
 and/or Konrad liked it that way). That's also why the packed
 attribute is needed here.

I am not particularly found as I keep on forgetting about the 1-byte field
as well. If you have a patch to clean it up would love to see it.

 
 But you will probably want to switch sector_number and handle, so
 that sector_number becomes aligned, and add another 16-bit
 padding field between handle and indirect_grefs[].
 
 I also wonder whether indirect_op wouldn't better be named
 actual_op or just op.

nods 'op' sounds good. With a comment saying it can do all of the BLKIF_OPS_..
except the BLKIF_OP_INDIRECT one. Thought one could in theory chain
it that way for fun.

 
 Jan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Jan Beulich
>>> On 28.02.13 at 13:00, Roger Pau Monné wrote:
> On 28/02/13 12:19, Jan Beulich wrote:
> On 28.02.13 at 11:28, Roger Pau Monne  wrote:
>>> @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
>>>   */
>>>  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
>>>  
>>> +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
>>> +
>>> +struct blkif_request_segment_aligned {
>>> +   grant_ref_t gref;/* reference to I/O buffer frame*/
>>> +   /* @first_sect: first sector in frame to transfer (inclusive).   */
>>> +   /* @last_sect: last sector in frame to transfer (inclusive). */
>>> +   uint8_t first_sect, last_sect;
>>> +   uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
>>> */
>>> +} __attribute__((__packed__));
>> 
>> What's the __packed__ for here?
> 
> Yes, that's not needed.
> 
>> 
>>> +
>>>  struct blkif_request_rw {
>>> uint8_tnr_segments;  /* number of segments   */
>>> blkif_vdev_t   handle;   /* only for read/write requests */
>>> @@ -138,11 +150,24 @@ struct blkif_request_discard {
>>> uint8_t_pad3;
>>>  } __attribute__((__packed__));
>>>  
>>> +struct blkif_request_indirect {
>>> +   uint8_tindirect_op;
>>> +   uint16_t   nr_segments;
>>> +#ifdef CONFIG_X86_64
>>> +   uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
>>> */
>>> +#endif
>> 
>> Either you want the structure be packed tightly (and you don't care
>> about misaligned fields), in which case you shouldn't need a padding
>> field. That's even more so as there's no padding between indirect_op
>> and nr_segments, so everything is misaligned anyway, and the
>> comment above is wrong too (offsetof() really ought to yield 7 in
>> that case).
> 
> This padding is because we want to have the "id" field at the same
> position as blkif_request_rw, so we need to add the padding for it to
> match 32 & 64 bit blkif_request_rw structures, this prevents adding some
> "if (req.op == BLKIF_OP_INDIRECT)..." if we only need to get the id of
> the request.

Oh, right, that's desirable of course.

> The comment is indeed wrong, I've copied it from blkif_request_discard
> and forgot to change the offset

But the offset stated there then is right after all - I forgot that
there is a 1-byte field outside the union (the way this is being done
in the upstream Linux header is really ugly imo, but I guess Jeremy
and/or Konrad liked it that way). That's also why the packed
attribute is needed here.

But you will probably want to switch sector_number and handle, so
that sector_number becomes aligned, and add another 16-bit
padding field between handle and indirect_grefs[].

I also wonder whether "indirect_op" wouldn't better be named
"actual_op" or just "op".

Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Roger Pau Monné
On 28/02/13 12:19, Jan Beulich wrote:
 On 28.02.13 at 11:28, Roger Pau Monne  wrote:
>> @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
>>   */
>>  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
>>  
>> +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
>> +
>> +struct blkif_request_segment_aligned {
>> +grant_ref_t gref;/* reference to I/O buffer frame*/
>> +/* @first_sect: first sector in frame to transfer (inclusive).   */
>> +/* @last_sect: last sector in frame to transfer (inclusive). */
>> +uint8_t first_sect, last_sect;
>> +uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
>> */
>> +} __attribute__((__packed__));
> 
> What's the __packed__ for here?

Yes, that's not needed.

> 
>> +
>>  struct blkif_request_rw {
>>  uint8_tnr_segments;  /* number of segments   */
>>  blkif_vdev_t   handle;   /* only for read/write requests */
>> @@ -138,11 +150,24 @@ struct blkif_request_discard {
>>  uint8_t_pad3;
>>  } __attribute__((__packed__));
>>  
>> +struct blkif_request_indirect {
>> +uint8_tindirect_op;
>> +uint16_t   nr_segments;
>> +#ifdef CONFIG_X86_64
>> +uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
>> */
>> +#endif
> 
> Either you want the structure be packed tightly (and you don't care
> about misaligned fields), in which case you shouldn't need a padding
> field. That's even more so as there's no padding between indirect_op
> and nr_segments, so everything is misaligned anyway, and the
> comment above is wrong too (offsetof() really ought to yield 7 in
> that case).

This padding is because we want to have the "id" field at the same
position as blkif_request_rw, so we need to add the padding for it to
match 32 & 64 bit blkif_request_rw structures, this prevents adding some
"if (req.op == BLKIF_OP_INDIRECT)..." if we only need to get the id of
the request.

The comment is indeed wrong, I've copied it from blkif_request_discard
and forgot to change the offset

> 
> Or you want the structure fields aligned, in which case you again
> ought to drop the use of the __packed__ attribute and introduce
> _all_ necessary padding fields.
> 
>> +uint64_t   id;
>> +blkif_vdev_t   handle;
>> +blkif_sector_t sector_number;
>> +grant_ref_tindirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
>> +} __attribute__((__packed__));
> 
> And then it would be quite nice for new features to no longer
> require translation between a 32- and a 64-bit layout at all.

The translation is caused by the id issue described above.

> Plus, rather than introducing uninitialized padding fields, I'd
> suggest using fields that are required to be zero initialized, to
> allow giving them a meaning later.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Jan Beulich
>>> On 28.02.13 at 11:28, Roger Pau Monne  wrote:
> @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
>   */
>  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
>  
> +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
> +
> +struct blkif_request_segment_aligned {
> + grant_ref_t gref;/* reference to I/O buffer frame*/
> + /* @first_sect: first sector in frame to transfer (inclusive).   */
> + /* @last_sect: last sector in frame to transfer (inclusive). */
> + uint8_t first_sect, last_sect;
> + uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
> */
> +} __attribute__((__packed__));

What's the __packed__ for here?

> +
>  struct blkif_request_rw {
>   uint8_tnr_segments;  /* number of segments   */
>   blkif_vdev_t   handle;   /* only for read/write requests */
> @@ -138,11 +150,24 @@ struct blkif_request_discard {
>   uint8_t_pad3;
>  } __attribute__((__packed__));
>  
> +struct blkif_request_indirect {
> + uint8_tindirect_op;
> + uint16_t   nr_segments;
> +#ifdef CONFIG_X86_64
> + uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
> */
> +#endif

Either you want the structure be packed tightly (and you don't care
about misaligned fields), in which case you shouldn't need a padding
field. That's even more so as there's no padding between indirect_op
and nr_segments, so everything is misaligned anyway, and the
comment above is wrong too (offsetof() really ought to yield 7 in
that case).

Or you want the structure fields aligned, in which case you again
ought to drop the use of the __packed__ attribute and introduce
_all_ necessary padding fields.

> + uint64_t   id;
> + blkif_vdev_t   handle;
> + blkif_sector_t sector_number;
> + grant_ref_tindirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
> +} __attribute__((__packed__));

And then it would be quite nice for new features to no longer
require translation between a 32- and a 64-bit layout at all.

Plus, rather than introducing uninitialized padding fields, I'd
suggest using fields that are required to be zero initialized, to
allow giving them a meaning later.

Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Roger Pau Monne
Indirect descriptors introduce a new block operation
(BLKIF_OP_INDIRECT) that passes grant references instead of segments
in the request. This grant references are filled with arrays of
blkif_request_segment_aligned, this way we can send more segments in a
request.

The proposed implementation sets the maximum number of indirect grefs
(frames filled with blkif_request_segment_aligned) to 256 in the
backend and 64 in the frontend. The value in the frontend has been
chosen experimentally, and the backend value has been set to a sane
value that allows expanding the maximum number of indirect descriptors
in the frontend if needed.

The migration code has changed from the previous implementation, in
which we simply remapped the segments on the shared ring. Now the
maximum number of segments allowed in a request can change depending
on the backend, so we have to requeue all the requests in the ring and
in the queue and split the bios in them if they are bigger than the
new maximum number of segments.

Signed-off-by: Roger Pau Monné 
Cc: Konrad Rzeszutek Wilk 
Cc: xen-de...@lists.xen.org
---
 drivers/block/xen-blkback/blkback.c |  129 +++---
 drivers/block/xen-blkback/common.h  |   80 ++-
 drivers/block/xen-blkback/xenbus.c  |8 +
 drivers/block/xen-blkfront.c|  498 +--
 include/xen/interface/io/blkif.h|   25 ++
 5 files changed, 622 insertions(+), 118 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 0fa30db..98eb16b 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -70,7 +70,7 @@ MODULE_PARM_DESC(reqs, "Number of blkback requests to 
allocate per backend");
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 352;
+static int xen_blkif_max_pgrants = 1024;
 module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
@@ -578,10 +578,6 @@ purge_gnt_list:
return 0;
 }
 
-struct seg_buf {
-   unsigned long buf;
-   unsigned int nsec;
-};
 /*
  * Unmap the grant references, and also remove the M2P over-rides
  * used in the 'pending_req'.
@@ -761,32 +757,79 @@ out_of_memory:
return -ENOMEM;
 }
 
-static int xen_blkbk_map_seg(struct blkif_request *req,
-struct pending_req *pending_req,
+static int xen_blkbk_map_seg(struct pending_req *pending_req,
 struct seg_buf seg[],
 struct page *pages[])
 {
int i, rc;
-   grant_ref_t grefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 
-   for (i = 0; i < req->u.rw.nr_segments; i++)
-   grefs[i] = req->u.rw.seg[i].gref;
-
-   rc = xen_blkbk_map(pending_req->blkif, grefs,
+   rc = xen_blkbk_map(pending_req->blkif, pending_req->grefs,
   pending_req->persistent_gnts,
   pending_req->grant_handles, pending_req->pages,
-  req->u.rw.nr_segments,
+  pending_req->nr_pages,
   (pending_req->operation != BLKIF_OP_READ));
if (rc)
return rc;
 
-   for (i = 0; i < req->u.rw.nr_segments; i++)
-   seg[i].buf = pfn_to_mfn(page_to_pfn(pending_req->pages[i]))
-<< PAGE_SHIFT | (req->u.rw.seg[i].first_sect << 9);
+   for (i = 0; i < pending_req->nr_pages; i++)
+   seg[i].buf |= pfn_to_mfn(page_to_pfn(pending_req->pages[i]))
+<< PAGE_SHIFT;
 
return 0;
 }
 
+static int xen_blkbk_parse_indirect(struct blkif_request *req,
+struct pending_req *pending_req,
+struct seg_buf seg[],
+struct phys_req *preq)
+{
+   struct persistent_gnt **persistent =
+   pending_req->indirect_persistent_gnts;
+   struct page **pages = pending_req->indirect_pages;
+   struct xen_blkif *blkif = pending_req->blkif;
+   int indirect_grefs, rc, n, nseg, i;
+   struct blkif_request_segment_aligned *segments = NULL;
+
+   nseg = pending_req->nr_pages;
+   indirect_grefs = (nseg + SEGS_PER_INDIRECT_FRAME - 1) /
+SEGS_PER_INDIRECT_FRAME;
+
+   rc = xen_blkbk_map(blkif, req->u.indirect.indirect_grefs,
+  persistent, pending_req->indirect_handles,
+  pages, indirect_grefs, true);
+   if (rc)
+   goto unmap;
+
+   for (n = 0, i = 0; n < nseg; n++) {
+   if ((n % SEGS_PER_INDIRECT_FRAME) == 0) {
+   /* Map indirect segments */
+   if (segments)
+   kunmap_atomic(segments);
+   segments =
+   

[PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Roger Pau Monne
Indirect descriptors introduce a new block operation
(BLKIF_OP_INDIRECT) that passes grant references instead of segments
in the request. This grant references are filled with arrays of
blkif_request_segment_aligned, this way we can send more segments in a
request.

The proposed implementation sets the maximum number of indirect grefs
(frames filled with blkif_request_segment_aligned) to 256 in the
backend and 64 in the frontend. The value in the frontend has been
chosen experimentally, and the backend value has been set to a sane
value that allows expanding the maximum number of indirect descriptors
in the frontend if needed.

The migration code has changed from the previous implementation, in
which we simply remapped the segments on the shared ring. Now the
maximum number of segments allowed in a request can change depending
on the backend, so we have to requeue all the requests in the ring and
in the queue and split the bios in them if they are bigger than the
new maximum number of segments.

Signed-off-by: Roger Pau Monné roger@citrix.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: xen-de...@lists.xen.org
---
 drivers/block/xen-blkback/blkback.c |  129 +++---
 drivers/block/xen-blkback/common.h  |   80 ++-
 drivers/block/xen-blkback/xenbus.c  |8 +
 drivers/block/xen-blkfront.c|  498 +--
 include/xen/interface/io/blkif.h|   25 ++
 5 files changed, 622 insertions(+), 118 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 0fa30db..98eb16b 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -70,7 +70,7 @@ MODULE_PARM_DESC(reqs, Number of blkback requests to 
allocate per backend);
  * algorithm.
  */
 
-static int xen_blkif_max_pgrants = 352;
+static int xen_blkif_max_pgrants = 1024;
 module_param_named(max_persistent_grants, xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  Maximum number of grants to map persistently);
@@ -578,10 +578,6 @@ purge_gnt_list:
return 0;
 }
 
-struct seg_buf {
-   unsigned long buf;
-   unsigned int nsec;
-};
 /*
  * Unmap the grant references, and also remove the M2P over-rides
  * used in the 'pending_req'.
@@ -761,32 +757,79 @@ out_of_memory:
return -ENOMEM;
 }
 
-static int xen_blkbk_map_seg(struct blkif_request *req,
-struct pending_req *pending_req,
+static int xen_blkbk_map_seg(struct pending_req *pending_req,
 struct seg_buf seg[],
 struct page *pages[])
 {
int i, rc;
-   grant_ref_t grefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 
-   for (i = 0; i  req-u.rw.nr_segments; i++)
-   grefs[i] = req-u.rw.seg[i].gref;
-
-   rc = xen_blkbk_map(pending_req-blkif, grefs,
+   rc = xen_blkbk_map(pending_req-blkif, pending_req-grefs,
   pending_req-persistent_gnts,
   pending_req-grant_handles, pending_req-pages,
-  req-u.rw.nr_segments,
+  pending_req-nr_pages,
   (pending_req-operation != BLKIF_OP_READ));
if (rc)
return rc;
 
-   for (i = 0; i  req-u.rw.nr_segments; i++)
-   seg[i].buf = pfn_to_mfn(page_to_pfn(pending_req-pages[i]))
- PAGE_SHIFT | (req-u.rw.seg[i].first_sect  9);
+   for (i = 0; i  pending_req-nr_pages; i++)
+   seg[i].buf |= pfn_to_mfn(page_to_pfn(pending_req-pages[i]))
+ PAGE_SHIFT;
 
return 0;
 }
 
+static int xen_blkbk_parse_indirect(struct blkif_request *req,
+struct pending_req *pending_req,
+struct seg_buf seg[],
+struct phys_req *preq)
+{
+   struct persistent_gnt **persistent =
+   pending_req-indirect_persistent_gnts;
+   struct page **pages = pending_req-indirect_pages;
+   struct xen_blkif *blkif = pending_req-blkif;
+   int indirect_grefs, rc, n, nseg, i;
+   struct blkif_request_segment_aligned *segments = NULL;
+
+   nseg = pending_req-nr_pages;
+   indirect_grefs = (nseg + SEGS_PER_INDIRECT_FRAME - 1) /
+SEGS_PER_INDIRECT_FRAME;
+
+   rc = xen_blkbk_map(blkif, req-u.indirect.indirect_grefs,
+  persistent, pending_req-indirect_handles,
+  pages, indirect_grefs, true);
+   if (rc)
+   goto unmap;
+
+   for (n = 0, i = 0; n  nseg; n++) {
+   if ((n % SEGS_PER_INDIRECT_FRAME) == 0) {
+   /* Map indirect segments */
+   if (segments)
+   kunmap_atomic(segments);
+   segments =
+   

Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Jan Beulich
 On 28.02.13 at 11:28, Roger Pau Monne roger@citrix.com wrote:
 @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
   */
  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
  
 +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
 +
 +struct blkif_request_segment_aligned {
 + grant_ref_t gref;/* reference to I/O buffer frame*/
 + /* @first_sect: first sector in frame to transfer (inclusive).   */
 + /* @last_sect: last sector in frame to transfer (inclusive). */
 + uint8_t first_sect, last_sect;
 + uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
 */
 +} __attribute__((__packed__));

What's the __packed__ for here?

 +
  struct blkif_request_rw {
   uint8_tnr_segments;  /* number of segments   */
   blkif_vdev_t   handle;   /* only for read/write requests */
 @@ -138,11 +150,24 @@ struct blkif_request_discard {
   uint8_t_pad3;
  } __attribute__((__packed__));
  
 +struct blkif_request_indirect {
 + uint8_tindirect_op;
 + uint16_t   nr_segments;
 +#ifdef CONFIG_X86_64
 + uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
 */
 +#endif

Either you want the structure be packed tightly (and you don't care
about misaligned fields), in which case you shouldn't need a padding
field. That's even more so as there's no padding between indirect_op
and nr_segments, so everything is misaligned anyway, and the
comment above is wrong too (offsetof() really ought to yield 7 in
that case).

Or you want the structure fields aligned, in which case you again
ought to drop the use of the __packed__ attribute and introduce
_all_ necessary padding fields.

 + uint64_t   id;
 + blkif_vdev_t   handle;
 + blkif_sector_t sector_number;
 + grant_ref_tindirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
 +} __attribute__((__packed__));

And then it would be quite nice for new features to no longer
require translation between a 32- and a 64-bit layout at all.

Plus, rather than introducing uninitialized padding fields, I'd
suggest using fields that are required to be zero initialized, to
allow giving them a meaning later.

Jan

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Roger Pau Monné
On 28/02/13 12:19, Jan Beulich wrote:
 On 28.02.13 at 11:28, Roger Pau Monne roger@citrix.com wrote:
 @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
   */
  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
  
 +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
 +
 +struct blkif_request_segment_aligned {
 +grant_ref_t gref;/* reference to I/O buffer frame*/
 +/* @first_sect: first sector in frame to transfer (inclusive).   */
 +/* @last_sect: last sector in frame to transfer (inclusive). */
 +uint8_t first_sect, last_sect;
 +uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
 */
 +} __attribute__((__packed__));
 
 What's the __packed__ for here?

Yes, that's not needed.

 
 +
  struct blkif_request_rw {
  uint8_tnr_segments;  /* number of segments   */
  blkif_vdev_t   handle;   /* only for read/write requests */
 @@ -138,11 +150,24 @@ struct blkif_request_discard {
  uint8_t_pad3;
  } __attribute__((__packed__));
  
 +struct blkif_request_indirect {
 +uint8_tindirect_op;
 +uint16_t   nr_segments;
 +#ifdef CONFIG_X86_64
 +uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
 */
 +#endif
 
 Either you want the structure be packed tightly (and you don't care
 about misaligned fields), in which case you shouldn't need a padding
 field. That's even more so as there's no padding between indirect_op
 and nr_segments, so everything is misaligned anyway, and the
 comment above is wrong too (offsetof() really ought to yield 7 in
 that case).

This padding is because we want to have the id field at the same
position as blkif_request_rw, so we need to add the padding for it to
match 32  64 bit blkif_request_rw structures, this prevents adding some
if (req.op == BLKIF_OP_INDIRECT)... if we only need to get the id of
the request.

The comment is indeed wrong, I've copied it from blkif_request_discard
and forgot to change the offset

 
 Or you want the structure fields aligned, in which case you again
 ought to drop the use of the __packed__ attribute and introduce
 _all_ necessary padding fields.
 
 +uint64_t   id;
 +blkif_vdev_t   handle;
 +blkif_sector_t sector_number;
 +grant_ref_tindirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
 +} __attribute__((__packed__));
 
 And then it would be quite nice for new features to no longer
 require translation between a 32- and a 64-bit layout at all.

The translation is caused by the id issue described above.

 Plus, rather than introducing uninitialized padding fields, I'd
 suggest using fields that are required to be zero initialized, to
 allow giving them a meaning later.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH RFC 12/12] xen-block: implement indirect descriptors

2013-02-28 Thread Jan Beulich
 On 28.02.13 at 13:00, Roger Pau Monnéroger@citrix.com wrote:
 On 28/02/13 12:19, Jan Beulich wrote:
 On 28.02.13 at 11:28, Roger Pau Monne roger@citrix.com wrote:
 @@ -109,6 +111,16 @@ typedef uint64_t blkif_sector_t;
   */
  #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11
  
 +#define BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST 8
 +
 +struct blkif_request_segment_aligned {
 +   grant_ref_t gref;/* reference to I/O buffer frame*/
 +   /* @first_sect: first sector in frame to transfer (inclusive).   */
 +   /* @last_sect: last sector in frame to transfer (inclusive). */
 +   uint8_t first_sect, last_sect;
 +   uint16_t_pad; /* padding to make it 8 bytes, so it's cache-aligned 
 */
 +} __attribute__((__packed__));
 
 What's the __packed__ for here?
 
 Yes, that's not needed.
 
 
 +
  struct blkif_request_rw {
 uint8_tnr_segments;  /* number of segments   */
 blkif_vdev_t   handle;   /* only for read/write requests */
 @@ -138,11 +150,24 @@ struct blkif_request_discard {
 uint8_t_pad3;
  } __attribute__((__packed__));
  
 +struct blkif_request_indirect {
 +   uint8_tindirect_op;
 +   uint16_t   nr_segments;
 +#ifdef CONFIG_X86_64
 +   uint32_t   _pad1;/* offsetof(blkif_...,u.indirect.id) == 8 
 */
 +#endif
 
 Either you want the structure be packed tightly (and you don't care
 about misaligned fields), in which case you shouldn't need a padding
 field. That's even more so as there's no padding between indirect_op
 and nr_segments, so everything is misaligned anyway, and the
 comment above is wrong too (offsetof() really ought to yield 7 in
 that case).
 
 This padding is because we want to have the id field at the same
 position as blkif_request_rw, so we need to add the padding for it to
 match 32  64 bit blkif_request_rw structures, this prevents adding some
 if (req.op == BLKIF_OP_INDIRECT)... if we only need to get the id of
 the request.

Oh, right, that's desirable of course.

 The comment is indeed wrong, I've copied it from blkif_request_discard
 and forgot to change the offset

But the offset stated there then is right after all - I forgot that
there is a 1-byte field outside the union (the way this is being done
in the upstream Linux header is really ugly imo, but I guess Jeremy
and/or Konrad liked it that way). That's also why the packed
attribute is needed here.

But you will probably want to switch sector_number and handle, so
that sector_number becomes aligned, and add another 16-bit
padding field between handle and indirect_grefs[].

I also wonder whether indirect_op wouldn't better be named
actual_op or just op.

Jan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/