Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote: LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements permitted by the HBA's DMA engine, for a single ATA command. Then it's the wrong parameter you're setting: phys_segments is what you have going into a dma_map_sg() but hw_segments is what you end up after it (mathemtaically, the mapping never splits segments, so hw_segments <= phys_segments always, but it's still the wrong way to bound this); so if you want to limit the SG elements in the HBA, you should set hw_segments not phys_segments (which is what the sg_tablesize parameter of the host structure corresponds to). Honestly I'm betting the code is a result of paranoia^H^H^Hconservatism on my part: setting every DMA and segment limit I can find, in the hopes that somehow, somewhere, that information will get filtered down to the IOMMU and whatnot. :) Given what you say above, and the fact that I set sg_tablesize, presumably Boaz's patch to remove phys_segment limiting in libata-scsi is the right thing to do. Your knowledge in this area is most likely stronger than mine. However, I suspect this has to do with our iommu problem, doesn't it? The IOMMU may merge the segments beyond your 64k DMA boundary, so you need to split them again ... in that case you need phys_segments bounded, not hw_segments? This is why I set every segment limitation I could find :) And then Ben tells me IOMMU may ignore all of that anyway, and so I have code to split inside libata as well :) I just know what IDE needs: S/G ent shouldn't cross 64k boundary, nor exceed 64k in size. The maximum number of S/G ents is actually a guess. Hard data is tough to come by. Some DMA engines will cycle through all of memory until they hit a stop bit. Others have limits yet to be discovered. :) Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote: > James Bottomley wrote: > > On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote: > >> Jeff Garzik wrote: > >>> Boaz Harrosh wrote: > >>>> FUJITA Tomonori wrote: > >>>>> From: Boaz Harrosh <[EMAIL PROTECTED]> > >>>>> Subject: Re: [PATCH v2] add bidi support for block pc requests > >>>>> Date: Thu, 17 May 2007 17:00:21 +0300 > >>>>> > >>>>>> Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I > >>>>>> fixed it. Now it works with 127. > >>>>> I think that we can just remove blk_queue_max_phys_segments since the > >>>>> ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. > >>>>> > >>>> Who should send a patch upstream? (I cc'ed Jeff Garzik) > >>>> > >>>> Boaz > >>>> > >>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > >>>> index dd81fa7..3660f3e 100644 > >>>> --- a/drivers/ata/libata-scsi.c > >>>> +++ b/drivers/ata/libata-scsi.c > >>>> @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) > >>>> > >>>> ata_scsi_sdev_config(sdev); > >>>> > >>>> -blk_queue_max_phys_segments(sdev->request_queue, > >>>> LIBATA_MAX_PRD); > >>>> - > >>>> sdev->manage_start_stop = 1; > >>> I don't mind the patch, but could someone refresh me as to the context? > >>> > >>> Is there something wrong with the above code, or is it simply redundant > >>> to the scsi_host_template settings in each LLDD? > >>> > >>> Jeff > >>> > >>> > >>> > >> Hi Jeff > >> What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD > >> (=128) > >> than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. > >> But what > >> happens if SCSI-ml sets a lower value? It will than crash on unexpected > >> high sg > >> count. My first Patch was an "if >" but Tomo said that it is redundant > >> since > >> drivers do that already. So I guess it is your call. Can it be removed or > >> we need > >> a: if ( sdev->request_queue->max_phys_segments > LIBATA_MAX_PRD) > > > > Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to > > set because what a driver wants to see is what comes out the other side > > of dma_map_sg() (which is controlled by hw_segments) not what goes in. > > However, I could see libata having an interest in the phys_segments if > > the physical region descriptors are going to the device via PIO ... is > > that what this is for? > > LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements > permitted by the HBA's DMA engine, for a single ATA command. Then it's the wrong parameter you're setting: phys_segments is what you have going into a dma_map_sg() but hw_segments is what you end up after it (mathemtaically, the mapping never splits segments, so hw_segments <= phys_segments always, but it's still the wrong way to bound this); so if you want to limit the SG elements in the HBA, you should set hw_segments not phys_segments (which is what the sg_tablesize parameter of the host structure corresponds to). However, I suspect this has to do with our iommu problem, doesn't it? The IOMMU may merge the segments beyond your 64k DMA boundary, so you need to split them again ... in that case you need phys_segments bounded, not hw_segments? I really wish we could get this iommu parameter problem fixed so we didn't have to have all of this confusing code. > The PIO code doesn't really care about segments. OK ... I just thought it was PIO because the only time you should worry about phys_segments is if you're not going to do a dma_map_sg() on the scatterlist. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote: Jeff Garzik wrote: Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 17:00:21 +0300 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I fixed it. Now it works with 127. I think that we can just remove blk_queue_max_phys_segments since the ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. Who should send a patch upstream? (I cc'ed Jeff Garzik) Boaz diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..3660f3e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); - blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); - sdev->manage_start_stop = 1; I don't mind the patch, but could someone refresh me as to the context? Is there something wrong with the above code, or is it simply redundant to the scsi_host_template settings in each LLDD? Jeff Hi Jeff What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128) than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But what happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg count. My first Patch was an "if >" but Tomo said that it is redundant since drivers do that already. So I guess it is your call. Can it be removed or we need a: if ( sdev->request_queue->max_phys_segments > LIBATA_MAX_PRD) Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to set because what a driver wants to see is what comes out the other side of dma_map_sg() (which is controlled by hw_segments) not what goes in. However, I could see libata having an interest in the phys_segments if the physical region descriptors are going to the device via PIO ... is that what this is for? LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements permitted by the HBA's DMA engine, for a single ATA command. The PIO code doesn't really care about segments. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote: > Jeff Garzik wrote: > > Boaz Harrosh wrote: > >> FUJITA Tomonori wrote: > >>> From: Boaz Harrosh <[EMAIL PROTECTED]> > >>> Subject: Re: [PATCH v2] add bidi support for block pc requests > >>> Date: Thu, 17 May 2007 17:00:21 +0300 > >>> > >>>> Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I > >>>> fixed it. Now it works with 127. > >>> I think that we can just remove blk_queue_max_phys_segments since the > >>> ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. > >>> > >> Who should send a patch upstream? (I cc'ed Jeff Garzik) > >> > >> Boaz > >> > >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > >> index dd81fa7..3660f3e 100644 > >> --- a/drivers/ata/libata-scsi.c > >> +++ b/drivers/ata/libata-scsi.c > >> @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) > >> > >>ata_scsi_sdev_config(sdev); > >> > >> - blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); > >> - > >>sdev->manage_start_stop = 1; > > > > I don't mind the patch, but could someone refresh me as to the context? > > > > Is there something wrong with the above code, or is it simply redundant > > to the scsi_host_template settings in each LLDD? > > > > Jeff > > > > > > > > Hi Jeff > What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD > (=128) > than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But > what > happens if SCSI-ml sets a lower value? It will than crash on unexpected high > sg > count. My first Patch was an "if >" but Tomo said that it is redundant since > drivers do that already. So I guess it is your call. Can it be removed or we > need > a: if ( sdev->request_queue->max_phys_segments > LIBATA_MAX_PRD) Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to set because what a driver wants to see is what comes out the other side of dma_map_sg() (which is controlled by hw_segments) not what goes in. However, I could see libata having an interest in the phys_segments if the physical region descriptors are going to the device via PIO ... is that what this is for? James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Jeff Garzik wrote: > Boaz Harrosh wrote: >> FUJITA Tomonori wrote: >>> From: Boaz Harrosh <[EMAIL PROTECTED]> >>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>> Date: Thu, 17 May 2007 17:00:21 +0300 >>> >>>> Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I >>>> fixed it. Now it works with 127. >>> I think that we can just remove blk_queue_max_phys_segments since the >>> ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. >>> >> Who should send a patch upstream? (I cc'ed Jeff Garzik) >> >> Boaz >> >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >> index dd81fa7..3660f3e 100644 >> --- a/drivers/ata/libata-scsi.c >> +++ b/drivers/ata/libata-scsi.c >> @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) >> >> ata_scsi_sdev_config(sdev); >> >> -blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); >> - >> sdev->manage_start_stop = 1; > > I don't mind the patch, but could someone refresh me as to the context? > > Is there something wrong with the above code, or is it simply redundant > to the scsi_host_template settings in each LLDD? > > Jeff > > > Hi Jeff What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128) than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But what happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg count. My first Patch was an "if >" but Tomo said that it is redundant since drivers do that already. So I guess it is your call. Can it be removed or we need a: if ( sdev->request_queue->max_phys_segments > LIBATA_MAX_PRD) Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 17:00:21 +0300 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I fixed it. Now it works with 127. I think that we can just remove blk_queue_max_phys_segments since the ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. Who should send a patch upstream? (I cc'ed Jeff Garzik) Boaz diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..3660f3e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); - blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); - sdev->manage_start_stop = 1; I don't mind the patch, but could someone refresh me as to the context? Is there something wrong with the above code, or is it simply redundant to the scsi_host_template settings in each LLDD? Jeff - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: Boaz Harrosh <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Thu, 24 May 2007 19:37:06 +0300 > >> FUJITA Tomonori wrote: >>>> FUJITA Tomonori wrote: >>> One thing that I found is: >>> >>> +#define scsi_resid(cmd) ((cmd)->sg_table->resid) >>> >>> >>> This doesn't work for some drivers (at least ipr) since they set >>> cmd->resid even with commands without data transfer. >>> >> James, Tomo. >> >> the last accessor: >> +#define scsi_resid(cmd) ((cmd)->resid) >> >> used as an l-value in drivers does not serve our purpose, as seen by the test >> implementation of scsi_sg_table. Now clearly this needs an accessor and it >> is a >> bidi parameter (need 2 of them). > > I thought that it would be better to fix several drivers (less than 10). I prefer inlines. One - Programmer cannot make mistakes. Why give him the freedom to something he must not do? two - if all/most drivers are doing: if (scsi_sgl(cmd)) scsi_resid(cmd) = 0; Than will it not be better to do the if() inside the API? Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 24 May 2007 19:37:06 +0300 > FUJITA Tomonori wrote: > >> FUJITA Tomonori wrote: > > > > One thing that I found is: > > > > +#define scsi_resid(cmd) ((cmd)->sg_table->resid) > > > > > > This doesn't work for some drivers (at least ipr) since they set > > cmd->resid even with commands without data transfer. > > > > James, Tomo. > > the last accessor: > +#define scsi_resid(cmd) ((cmd)->resid) > > used as an l-value in drivers does not serve our purpose, as seen by the test > implementation of scsi_sg_table. Now clearly this needs an accessor and it is > a > bidi parameter (need 2 of them). I thought that it would be better to fix several drivers (less than 10). - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Boaz Harrosh wrote: > FUJITA Tomonori wrote: >>> FUJITA Tomonori wrote: >> One thing that I found is: >> >> +#define scsi_resid(cmd) ((cmd)->sg_table->resid) >> >> >> This doesn't work for some drivers (at least ipr) since they set >> cmd->resid even with commands without data transfer. >> > > James, Tomo. > > the last accessor: > +#define scsi_resid(cmd) ((cmd)->resid) > > used as an l-value in drivers does not serve our purpose, as seen by the test > implementation of scsi_sg_table. Now clearly this needs an accessor and it is > a > bidi parameter (need 2 of them). > > I would suggest doing a set_ inline accessor and using that in drivers: > +static inline void scsi_set_resid(struct scsi_cmnd *cmd, resid) > +{ > + cmd->resid = resid; > +} > > I would even suggest to make all accessors inlines and not macros just to > make sure > they are not used as l-value and modified. Though I have not seen such use in > Tomo's patchset. > > example: > +static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd) > +{ > + return cmd->request_bufflen; > +} > > Boaz > - I forgot to say. Tomo, I have started to convert to the above, based on your cleanup git-tree. Do you need that I send a patch? (One thing I still don't have a good solution for. When having a git-tree like that with a long patchset. How to fix/update individual patches?) Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: >> FUJITA Tomonori wrote: > > One thing that I found is: > > +#define scsi_resid(cmd) ((cmd)->sg_table->resid) > > > This doesn't work for some drivers (at least ipr) since they set > cmd->resid even with commands without data transfer. > James, Tomo. the last accessor: +#define scsi_resid(cmd) ((cmd)->resid) used as an l-value in drivers does not serve our purpose, as seen by the test implementation of scsi_sg_table. Now clearly this needs an accessor and it is a bidi parameter (need 2 of them). I would suggest doing a set_ inline accessor and using that in drivers: +static inline void scsi_set_resid(struct scsi_cmnd *cmd, resid) +{ + cmd->resid = resid; +} I would even suggest to make all accessors inlines and not macros just to make sure they are not used as l-value and modified. Though I have not seen such use in Tomo's patchset. example: +static inline unsigned scsi_bufflen(struct scsi_cmnd *cmd) +{ + return cmd->request_bufflen; +} Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: Boaz Harrosh <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Thu, 17 May 2007 11:49:37 +0300 > >> FUJITA Tomonori wrote: >>> From: Jens Axboe <[EMAIL PROTECTED]> >>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>> Date: Thu, 17 May 2007 07:48:13 +0200 >>> >>>> On Thu, May 17 2007, FUJITA Tomonori wrote: >>>>> From: Jens Axboe <[EMAIL PROTECTED]> >>>>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>>>> Date: Wed, 16 May 2007 19:53:22 +0200 >>>>> >>>>>> On Wed, May 16 2007, Boaz Harrosh wrote: >>>>>>> now there are 2 issues with this: >>>>>>> 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get >>>>>>>requests for use_sg=128 which will crash the kernel. >>>>>> That sounds like a serious issue, it should definitely not happen. Stuff >>>>>> like that would bite other drivers as well, are you absolutely sure that >>>>>> is happening? Power-of-2 bug in your code, or in the SCSI code? >>>>> Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? >> These are regular fs (ext3) requests during bootup. The machine will not >> boot. (Usually from the read ahead code) >> Don't believe me look at the second patch Over Tomo's cleanup. >> If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I >> did in code: >> blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); >> I suppose someone is looking at a different definition. Or there is >> another call I need to do for this to work. > > I modified your patch a bit (sgtable allocation) and set > SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with > iscsi_tcp and ipr drivers). > > iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments > seems to work. > > One thing that I found is: > > +#define scsi_resid(cmd) ((cmd)->sg_table->resid) > > > This doesn't work for some drivers (at least ipr) since they set > cmd->resid even with commands without data transfer. Hmm, since we need a residual count also for the bidi_in transfer this problem is another reason for having the scsi_cmd_buff in struct scsi_cmnd, allocate another one from a pool for the bidi case, and point to the sglist in both cases rather than having a sg_table header allocated along with the sg list. Alternatively, having a pool for the no-data case might be another possible solution, though it seems a bit odd to me. > -void scsi_free_sgtable(struct scatterlist *sgl, int index) > +static struct scsi_sg_table *scsi_alloc_sgtable(int nseg, gfp_t gfp_mask) > { > struct scsi_host_sg_pool *sgp; > + struct scsi_sg_table *sgt; > + unsigned int idx; > > - BUG_ON(index >= SG_MEMPOOL_NR); > + for (idx = 0; idx < SG_MEMPOOL_NR; idx++) > + if (scsi_sg_pools[idx].size >= nseg) > + goto found; Tomo, I prefer the loop to be based on calculation as follows rather than scanning the scsi_sg_pools table in order to minimize memory access (and thrashing of the cpu data cache - each scsi_host_sg_pool takes a cache row on x86_64) + for (i = 0, size = 8; i < SG_MEMPOOL_NR-1; i++, size <<= 1) + if (size > nents) + return i; Benny - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: Boaz Harrosh <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Thu, 17 May 2007 17:00:21 +0300 > >> Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I >> fixed it. Now it works with 127. > > I think that we can just remove blk_queue_max_phys_segments since the > ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. > Who should send a patch upstream? (I cc'ed Jeff Garzik) Boaz diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..3660f3e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); - blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); - sdev->manage_start_stop = 1; if (dev) - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 17:00:21 +0300 > James Bottomley wrote: > > On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote: > >> These are regular fs (ext3) requests during bootup. The machine will not > >> boot. (Usually from the read ahead code) > >> Don't believe me look at the second patch Over Tomo's cleanup. > >> If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I > >> did in code: > >>blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); > >> I suppose someone is looking at a different definition. Or there is > >> another call I need to do for this to work. > > > > It would really help us if you showed the actual code for what you did > > and where ... if this is wrong, we have bigger problems that quibbling > > about bidirectional slab sizes. The correct way to adjust this limit > > artificially to 127 is: > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 1f5a07b..4a27841 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct > > Scsi_Host *shost, > > return NULL; > > > > blk_queue_max_hw_segments(q, shost->sg_tablesize); > > - blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS); > > + blk_queue_max_phys_segments(q, 127); > > blk_queue_max_sectors(q, shost->max_sectors); > > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost)); > > blk_queue_segment_boundary(q, shost->dma_boundary); > > > > (It doesn't alter the allocation pools or anything else, just limits the > > max phys segments of the queue). The way to check that this limit is > > being honoured is: > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 1f5a07b..ae42e4d 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd) > > * kmapping pages) > > */ > > cmd->use_sg = req->nr_phys_segments; > > + WARN_ON(req->nr_phys_segments > 127); > > > > /* > > * If sg table allocation fails, requeue request later. > > > > James > > > > > Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I > fixed it. Now it works with 127. I think that we can just remove blk_queue_max_phys_segments since the ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. > (By the way Tomo, a printk like you did in scsi_init_io and > scsi_free_sgtable, 2 for every IO, or even 1 for every IO, will make > a SCSI booting PC like mine almost un-usable. Think of printk going > to log-file doing a printk...) Oops, I forgot to remove it since my SCSI booting pSeries box works with that. :) - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: > On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote: >> These are regular fs (ext3) requests during bootup. The machine will not >> boot. (Usually from the read ahead code) >> Don't believe me look at the second patch Over Tomo's cleanup. >> If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I >> did in code: >> blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); >> I suppose someone is looking at a different definition. Or there is >> another call I need to do for this to work. > > It would really help us if you showed the actual code for what you did > and where ... if this is wrong, we have bigger problems that quibbling > about bidirectional slab sizes. The correct way to adjust this limit > artificially to 127 is: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 1f5a07b..4a27841 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct > Scsi_Host *shost, > return NULL; > > blk_queue_max_hw_segments(q, shost->sg_tablesize); > - blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS); > + blk_queue_max_phys_segments(q, 127); > blk_queue_max_sectors(q, shost->max_sectors); > blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost)); > blk_queue_segment_boundary(q, shost->dma_boundary); > > (It doesn't alter the allocation pools or anything else, just limits the > max phys segments of the queue). The way to check that this limit is > being honoured is: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index 1f5a07b..ae42e4d 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd) >* kmapping pages) >*/ > cmd->use_sg = req->nr_phys_segments; > + WARN_ON(req->nr_phys_segments > 127); > > /* >* If sg table allocation fails, requeue request later. > > James > > Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I fixed it. Now it works with 127. (By the way Tomo, a printk like you did in scsi_init_io and scsi_free_sgtable, 2 for every IO, or even 1 for every IO, will make a SCSI booting PC like mine almost un-usable. Think of printk going to log-file doing a printk...) diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..de8c796 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,7 +800,9 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); - blk_queue_max_phys_segments(sdev->request_queue, LIBATA_MAX_PRD); + if ( sdev->request_queue->max_phys_segments > LIBATA_MAX_PRD ) + blk_queue_max_phys_segments(sdev->request_queue, + LIBATA_MAX_PRD); sdev->manage_start_stop = 1; - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Thu, 2007-05-17 at 11:49 +0300, Boaz Harrosh wrote: > These are regular fs (ext3) requests during bootup. The machine will not > boot. (Usually from the read ahead code) > Don't believe me look at the second patch Over Tomo's cleanup. > If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I > did in code: > blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); > I suppose someone is looking at a different definition. Or there is > another call I need to do for this to work. It would really help us if you showed the actual code for what you did and where ... if this is wrong, we have bigger problems that quibbling about bidirectional slab sizes. The correct way to adjust this limit artificially to 127 is: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1f5a07b..4a27841 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1576,7 +1576,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost, return NULL; blk_queue_max_hw_segments(q, shost->sg_tablesize); - blk_queue_max_phys_segments(q, SCSI_MAX_PHYS_SEGMENTS); + blk_queue_max_phys_segments(q, 127); blk_queue_max_sectors(q, shost->max_sectors); blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost)); blk_queue_segment_boundary(q, shost->dma_boundary); (It doesn't alter the allocation pools or anything else, just limits the max phys segments of the queue). The way to check that this limit is being honoured is: diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1f5a07b..ae42e4d 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1000,6 +1000,7 @@ static int scsi_init_io(struct scsi_cmnd *cmd) * kmapping pages) */ cmd->use_sg = req->nr_phys_segments; + WARN_ON(req->nr_phys_segments > 127); /* * If sg table allocation fails, requeue request later. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 11:49:37 +0300 > FUJITA Tomonori wrote: > > From: Jens Axboe <[EMAIL PROTECTED]> > > Subject: Re: [PATCH v2] add bidi support for block pc requests > > Date: Thu, 17 May 2007 07:48:13 +0200 > > > >> On Thu, May 17 2007, FUJITA Tomonori wrote: > >>> From: Jens Axboe <[EMAIL PROTECTED]> > >>> Subject: Re: [PATCH v2] add bidi support for block pc requests > >>> Date: Wed, 16 May 2007 19:53:22 +0200 > >>> > >>>> On Wed, May 16 2007, Boaz Harrosh wrote: > >>>>> now there are 2 issues with this: > >>>>> 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get > >>>>>requests for use_sg=128 which will crash the kernel. > >>>> That sounds like a serious issue, it should definitely not happen. Stuff > >>>> like that would bite other drivers as well, are you absolutely sure that > >>>> is happening? Power-of-2 bug in your code, or in the SCSI code? > >>> Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? > > These are regular fs (ext3) requests during bootup. The machine will not > boot. (Usually from the read ahead code) > Don't believe me look at the second patch Over Tomo's cleanup. > If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I > did in code: > blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); > I suppose someone is looking at a different definition. Or there is > another call I need to do for this to work. ata_scsi_slave_config? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 11:49:37 +0300 > FUJITA Tomonori wrote: > > From: Jens Axboe <[EMAIL PROTECTED]> > > Subject: Re: [PATCH v2] add bidi support for block pc requests > > Date: Thu, 17 May 2007 07:48:13 +0200 > > > >> On Thu, May 17 2007, FUJITA Tomonori wrote: > >>> From: Jens Axboe <[EMAIL PROTECTED]> > >>> Subject: Re: [PATCH v2] add bidi support for block pc requests > >>> Date: Wed, 16 May 2007 19:53:22 +0200 > >>> > >>>> On Wed, May 16 2007, Boaz Harrosh wrote: > >>>>> now there are 2 issues with this: > >>>>> 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get > >>>>>requests for use_sg=128 which will crash the kernel. > >>>> That sounds like a serious issue, it should definitely not happen. Stuff > >>>> like that would bite other drivers as well, are you absolutely sure that > >>>> is happening? Power-of-2 bug in your code, or in the SCSI code? > >>> Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? > > These are regular fs (ext3) requests during bootup. The machine will not > boot. (Usually from the read ahead code) > Don't believe me look at the second patch Over Tomo's cleanup. > If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I > did in code: > blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); > I suppose someone is looking at a different definition. Or there is > another call I need to do for this to work. I modified your patch a bit (sgtable allocation) and set SCSI_MAX_SG_SEGMENTS to 127. My ppc64 box seems to work (with iscsi_tcp and ipr drivers). iscsi_tcp sets sg_tablesize to 255, but blk_queue_max_phys_segments seems to work. One thing that I found is: +#define scsi_resid(cmd) ((cmd)->sg_table->resid) This doesn't work for some drivers (at least ipr) since they set cmd->resid even with commands without data transfer. This patch is against my cleanup tree. Boaz, btw, don't worry about scsi_tgt_lib.c on scsi_alloc/free_sgtable. You can break it. I need to fix it for bidi anyway. diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 70454b4..d8fb9c4 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -35,33 +35,34 @@ #define SG_MEMPOOL_SIZE 2 struct scsi_host_sg_pool { size_t size; - char*name; + char*name; struct kmem_cache *slab; mempool_t *pool; }; -#if (SCSI_MAX_PHYS_SEGMENTS < 32) -#error SCSI_MAX_PHYS_SEGMENTS is too small -#endif +/* + * Should fit within a single page. + */ + +enum { SCSI_MAX_SG_SEGMENTS = 127 }; + +enum { SCSI_MAX_SG_SEGMENTS_CALC = + ((PAGE_SIZE - sizeof(struct scsi_sg_table)) / + sizeof(struct scatterlist)) }; + +#define SG_MEMPOOL_NR ARRAY_SIZE(scsi_sg_pools) -#define SP(x) { x, "sgpool-" #x } +/*#define SG_MEMPOOL_NR (ARRAY_SIZE(scsi_sg_pools) - \ + (SCSI_MAX_SG_SEGMENTS < 254))*/ + +#define SP(x) { x, "sgpool-" #x } static struct scsi_host_sg_pool scsi_sg_pools[] = { - SP(8), - SP(16), - SP(32), -#if (SCSI_MAX_PHYS_SEGMENTS > 32) - SP(64), -#if (SCSI_MAX_PHYS_SEGMENTS > 64) - SP(128), -#if (SCSI_MAX_PHYS_SEGMENTS > 128) - SP(256), -#if (SCSI_MAX_PHYS_SEGMENTS > 256) -#error SCSI_MAX_PHYS_SEGMENTS is too large -#endif -#endif -#endif -#endif -}; + SP(7), + SP(15), + SP(31), + SP(63), + SP(SCSI_MAX_SG_SEGMENTS) +}; #undef SP static void scsi_run_queue(struct request_queue *q); @@ -701,60 +702,38 @@ static struct scsi_cmnd *scsi_end_reques return NULL; } -struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask) +static void scsi_free_sgtable(struct scsi_sg_table *sgt) { - struct scsi_host_sg_pool *sgp; - struct scatterlist *sgl; - - BUG_ON(!cmd->use_sg); - - switch (cmd->use_sg) { - case 1 ... 8: - cmd->sglist_len = 0; - break; - case 9 ... 16: - cmd->sglist_len = 1; - break; - case 17 ... 32: - cmd->sglist_len = 2; - break; -#if (SCSI_MAX_PHYS_SEGMENTS > 32) - case 33 ... 64: - cmd->sglist_len = 3; - break; -#if (SCSI_MAX_PHYS_SEGMENTS > 64) - case 65 ... 128: - cmd->sglist_len = 4; - break; -#if (SCSI_MAX_PHYS_SEGMENTS > 128) - case 129 ... 256: - cmd->sglist_len = 5; - break; -#endif -#endif -#endif - default: - retur
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: Jens Axboe <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Thu, 17 May 2007 07:48:13 +0200 > >> On Thu, May 17 2007, FUJITA Tomonori wrote: >>> From: Jens Axboe <[EMAIL PROTECTED]> >>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>> Date: Wed, 16 May 2007 19:53:22 +0200 >>> >>>> On Wed, May 16 2007, Boaz Harrosh wrote: >>>>> now there are 2 issues with this: >>>>> 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get >>>>>requests for use_sg=128 which will crash the kernel. >>>> That sounds like a serious issue, it should definitely not happen. Stuff >>>> like that would bite other drivers as well, are you absolutely sure that >>>> is happening? Power-of-2 bug in your code, or in the SCSI code? >>> Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? These are regular fs (ext3) requests during bootup. The machine will not boot. (Usually from the read ahead code) Don't believe me look at the second patch Over Tomo's cleanup. If I define SCSI_MAX_SG_SEGMENTS to 127 it will crash even when I did in code: blk_queue_max_phys_segments(q, SCSI_MAX_SG_SEGMENTS); I suppose someone is looking at a different definition. Or there is another call I need to do for this to work. >>> >>> >>>>> If anyone wants to see I have done 2 versions of this work. One on top >>>>> of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup >>>>> git. both can be found here: >>>>> http://www.bhalevy.com/open-osd/download/scsi_sg_table/ >>>> +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ >>>> + for (__i = 0, sg = scsi_sglist(cmd); __i < (nseg); __i++, (sg)++) >>>> Yes leftovers from Tomos branch. Sorry for the noise. This is in no way finished work Just experimenting. Any way in this git tree no one is using this macro. (git-pull of Linus tree, apply ver5, and this patch) This branch will work fine because I have not touched the numbers yet Just screwed allocations. >>>> Hmm? >>> When for_each_sg is ready, we convert scsi_for_each_sg to use >>> for_each_sg. I finished the cleanups on more than 40 drivers on the >>> top of your patches. >> But this is from the patch that is based on my sglist branch, so it >> looked somewhat odd. > > Oops, I didn't see the patch based on the sglist branch. Yeah, it's > odd though it isn't used, I guess. > - > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Jens Axboe wrote: > On Wed, May 16 2007, James Bottomley wrote: >> On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote: >>> The 1-page thing isn't a restriction as such, it's just an optimization. >>> The scatterlist allocated is purely a kernel entity, so you could do 4 >>> contig pages and larger ios that way, if higher order allocations were >>> reliable. >>> >>> But you are right in that we need to tweak the sg pool size so that it >>> ends up being a nice size, and not something that either spans a little >>> bit into a second page or doesn't fill a page nicely. On my x86-64 here, >>> a 128 segment sg table is exactly one page (looking at slabinfo). It >>> depends on the allocator whether that is just right, or just a little >>> too much due to management information. >> Actually, if you look at the slab allocation algorithm (particularly >> calculate_slab_order()) you'll find it's not as simplistic as you're >> assuming ... what it actually does is try to allocate > 1 item in n >> pages to reduce the leftovers. > > I'm not assuming anything, I was just being weary of having elements > that are exactly page sized if that would cause a bit of spill into a > second page. Don't tell me that PAGE_SIZE+10 (or whatever it might be) > would ever be an optimal allocation size. > >> Additionally, remember that turning on redzoning, which seems to be >> quite popular nowadays, actually blows out the slab size calculations >> anyway. > > Debugging will always throw these things out the window, we can't and > should not optimize for that. That goes for slab, and for lots of other > things. > >> The bottom line is that it's better for us just to do exactly what we >> need and let the allocation algorithms figure out how to do it >> efficiently rather than trying to second guess them. > > Partly true, it's also silly to just hardcore power-of-2 numbers without > ever bothering to look at what that results in (or even if it fits > normal use patterns). > > We can easily be flexible, so it seems silly not to at least do a bit of > background research. > The thing is that now every thing fits like a glove. i386/32bit-arch have 16 bytes scatterlist struct, 256 in a page. x86_64/64bit-arch 32 byte and 128 fit exactly in a page. If we do any code that throws this off it will be a performance regression. Call it beginners luck, call it someone spent a long night handcrafting it this way. Just that I think the current system is perfect and we should not touch it. There are other options for bidi. (just my $0.02) Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Jens Axboe <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 07:48:13 +0200 > On Thu, May 17 2007, FUJITA Tomonori wrote: > > From: Jens Axboe <[EMAIL PROTECTED]> > > Subject: Re: [PATCH v2] add bidi support for block pc requests > > Date: Wed, 16 May 2007 19:53:22 +0200 > > > > > On Wed, May 16 2007, Boaz Harrosh wrote: > > > > Boaz Harrosh wrote: > > > > > James Bottomley wrote: > > > > >> > > > > >> There's actually a fourth option you haven't considered: > > > > >> > > > > >> Roll all the required sglist definitions (request_bufflen, > > > > >> request_buffer, use_sg and sglist_len) into the sgtable pools. > > > > >> > > > > > This is a grate Idea. Let me see if I understand what you mean. > > > > > ... > > > > > ... > > > > > > > > Hi Dear James, list. > > > > I have worked on proposed solution (If anyone is interested see > > > > url blow) > > > > > > > > Now it works and all but I hit some problems. > > > > What happens is that in 64 bit architectures, well in x86_64, > > > > the sizeof scatterlist structure is 32 bytes which means that > > > > we can only fit exactly 128 of them in a page. But together with > > > > the scsi_sg_table header we are down to 127. Also if we want > > > > good alignment/packing of other sized pools we want: > > > > static struct scsi_host_sg_pool scsi_sg_pools[] = { > > > > SP(7), > > > > SP(15), > > > > SP(31), > > > > SP(63), > > > > SP(127) > > > > }; > > > > > > > > now there are 2 issues with this: > > > > 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get > > > >requests for use_sg=128 which will crash the kernel. > > > > > > That sounds like a serious issue, it should definitely not happen. Stuff > > > like that would bite other drivers as well, are you absolutely sure that > > > is happening? Power-of-2 bug in your code, or in the SCSI code? > > > > Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? > > > > > > > > If anyone wants to see I have done 2 versions of this work. One on top > > > > of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup > > > > git. both can be found here: > > > > http://www.bhalevy.com/open-osd/download/scsi_sg_table/ > > > > > > +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ > > > + for (__i = 0, sg = scsi_sglist(cmd); __i < (nseg); __i++, (sg)++) > > > > > > Hmm? > > > > When for_each_sg is ready, we convert scsi_for_each_sg to use > > for_each_sg. I finished the cleanups on more than 40 drivers on the > > top of your patches. > > But this is from the patch that is based on my sglist branch, so it > looked somewhat odd. Oops, I didn't see the patch based on the sglist branch. Yeah, it's odd though it isn't used, I guess. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Thu, May 17 2007, FUJITA Tomonori wrote: > From: Jens Axboe <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Wed, 16 May 2007 19:53:22 +0200 > > > On Wed, May 16 2007, Boaz Harrosh wrote: > > > Boaz Harrosh wrote: > > > > James Bottomley wrote: > > > >> > > > >> There's actually a fourth option you haven't considered: > > > >> > > > >> Roll all the required sglist definitions (request_bufflen, > > > >> request_buffer, use_sg and sglist_len) into the sgtable pools. > > > >> > > > > This is a grate Idea. Let me see if I understand what you mean. > > > > ... > > > > ... > > > > > > Hi Dear James, list. > > > I have worked on proposed solution (If anyone is interested see > > > url blow) > > > > > > Now it works and all but I hit some problems. > > > What happens is that in 64 bit architectures, well in x86_64, > > > the sizeof scatterlist structure is 32 bytes which means that > > > we can only fit exactly 128 of them in a page. But together with > > > the scsi_sg_table header we are down to 127. Also if we want > > > good alignment/packing of other sized pools we want: > > > static struct scsi_host_sg_pool scsi_sg_pools[] = { > > > SP(7), > > > SP(15), > > > SP(31), > > > SP(63), > > > SP(127) > > > }; > > > > > > now there are 2 issues with this: > > > 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get > > >requests for use_sg=128 which will crash the kernel. > > > > That sounds like a serious issue, it should definitely not happen. Stuff > > like that would bite other drivers as well, are you absolutely sure that > > is happening? Power-of-2 bug in your code, or in the SCSI code? > > Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? > > > > > If anyone wants to see I have done 2 versions of this work. One on top > > > of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup > > > git. both can be found here: > > > http://www.bhalevy.com/open-osd/download/scsi_sg_table/ > > > > +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ > > + for (__i = 0, sg = scsi_sglist(cmd); __i < (nseg); __i++, (sg)++) > > > > Hmm? > > When for_each_sg is ready, we convert scsi_for_each_sg to use > for_each_sg. I finished the cleanups on more than 40 drivers on the > top of your patches. But this is from the patch that is based on my sglist branch, so it looked somewhat odd. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Jens Axboe <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Wed, 16 May 2007 19:53:22 +0200 > On Wed, May 16 2007, Boaz Harrosh wrote: > > Boaz Harrosh wrote: > > > James Bottomley wrote: > > >> > > >> There's actually a fourth option you haven't considered: > > >> > > >> Roll all the required sglist definitions (request_bufflen, > > >> request_buffer, use_sg and sglist_len) into the sgtable pools. > > >> > > > This is a grate Idea. Let me see if I understand what you mean. > > > ... > > > ... > > > > Hi Dear James, list. > > I have worked on proposed solution (If anyone is interested see > > url blow) > > > > Now it works and all but I hit some problems. > > What happens is that in 64 bit architectures, well in x86_64, > > the sizeof scatterlist structure is 32 bytes which means that > > we can only fit exactly 128 of them in a page. But together with > > the scsi_sg_table header we are down to 127. Also if we want > > good alignment/packing of other sized pools we want: > > static struct scsi_host_sg_pool scsi_sg_pools[] = { > > SP(7), > > SP(15), > > SP(31), > > SP(63), > > SP(127) > > }; > > > > now there are 2 issues with this: > > 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get > >requests for use_sg=128 which will crash the kernel. > > That sounds like a serious issue, it should definitely not happen. Stuff > like that would bite other drivers as well, are you absolutely sure that > is happening? Power-of-2 bug in your code, or in the SCSI code? Boaz, how do you send requests to the scsi-ml, via fs, sg, or bsg? > > If anyone wants to see I have done 2 versions of this work. One on top > > of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup > > git. both can be found here: > > http://www.bhalevy.com/open-osd/download/scsi_sg_table/ > > +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ > + for (__i = 0, sg = scsi_sglist(cmd); __i < (nseg); __i++, (sg)++) > > Hmm? When for_each_sg is ready, we convert scsi_for_each_sg to use for_each_sg. I finished the cleanups on more than 40 drivers on the top of your patches. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Wed, May 16 2007, James Bottomley wrote: > On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote: > > The 1-page thing isn't a restriction as such, it's just an optimization. > > The scatterlist allocated is purely a kernel entity, so you could do 4 > > contig pages and larger ios that way, if higher order allocations were > > reliable. > > > > But you are right in that we need to tweak the sg pool size so that it > > ends up being a nice size, and not something that either spans a little > > bit into a second page or doesn't fill a page nicely. On my x86-64 here, > > a 128 segment sg table is exactly one page (looking at slabinfo). It > > depends on the allocator whether that is just right, or just a little > > too much due to management information. > > Actually, if you look at the slab allocation algorithm (particularly > calculate_slab_order()) you'll find it's not as simplistic as you're > assuming ... what it actually does is try to allocate > 1 item in n > pages to reduce the leftovers. I'm not assuming anything, I was just being weary of having elements that are exactly page sized if that would cause a bit of spill into a second page. Don't tell me that PAGE_SIZE+10 (or whatever it might be) would ever be an optimal allocation size. > Additionally, remember that turning on redzoning, which seems to be > quite popular nowadays, actually blows out the slab size calculations > anyway. Debugging will always throw these things out the window, we can't and should not optimize for that. That goes for slab, and for lots of other things. > The bottom line is that it's better for us just to do exactly what we > need and let the allocation algorithms figure out how to do it > efficiently rather than trying to second guess them. Partly true, it's also silly to just hardcore power-of-2 numbers without ever bothering to look at what that results in (or even if it fits normal use patterns). We can easily be flexible, so it seems silly not to at least do a bit of background research. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Wed, 2007-05-16 at 19:53 +0200, Jens Axboe wrote: > The 1-page thing isn't a restriction as such, it's just an optimization. > The scatterlist allocated is purely a kernel entity, so you could do 4 > contig pages and larger ios that way, if higher order allocations were > reliable. > > But you are right in that we need to tweak the sg pool size so that it > ends up being a nice size, and not something that either spans a little > bit into a second page or doesn't fill a page nicely. On my x86-64 here, > a 128 segment sg table is exactly one page (looking at slabinfo). It > depends on the allocator whether that is just right, or just a little > too much due to management information. Actually, if you look at the slab allocation algorithm (particularly calculate_slab_order()) you'll find it's not as simplistic as you're assuming ... what it actually does is try to allocate > 1 item in n pages to reduce the leftovers. Additionally, remember that turning on redzoning, which seems to be quite popular nowadays, actually blows out the slab size calculations anyway. The bottom line is that it's better for us just to do exactly what we need and let the allocation algorithms figure out how to do it efficiently rather than trying to second guess them. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Wed, May 16 2007, Boaz Harrosh wrote: > Boaz Harrosh wrote: > > James Bottomley wrote: > >> > >> There's actually a fourth option you haven't considered: > >> > >> Roll all the required sglist definitions (request_bufflen, > >> request_buffer, use_sg and sglist_len) into the sgtable pools. > >> > > This is a grate Idea. Let me see if I understand what you mean. > > ... > > ... > > Hi Dear James, list. > I have worked on proposed solution (If anyone is interested see > url blow) > > Now it works and all but I hit some problems. > What happens is that in 64 bit architectures, well in x86_64, > the sizeof scatterlist structure is 32 bytes which means that > we can only fit exactly 128 of them in a page. But together with > the scsi_sg_table header we are down to 127. Also if we want > good alignment/packing of other sized pools we want: > static struct scsi_host_sg_pool scsi_sg_pools[] = { > SP(7), > SP(15), > SP(31), > SP(63), > SP(127) > }; > > now there are 2 issues with this: > 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get >requests for use_sg=128 which will crash the kernel. That sounds like a serious issue, it should definitely not happen. Stuff like that would bite other drivers as well, are you absolutely sure that is happening? Power-of-2 bug in your code, or in the SCSI code? > 2. If I do SPs of 7,15,31,63,128 or even 8,16,32,64,128 it will >boot and work but clearly it does not fit in one page. So either >my sata drivers and iSCSI, which I test, are not sensitive to >scatterlists fit one page. Or kernel gives me 2 contiguous pages? The 1-page thing isn't a restriction as such, it's just an optimization. The scatterlist allocated is purely a kernel entity, so you could do 4 contig pages and larger ios that way, if higher order allocations were reliable. But you are right in that we need to tweak the sg pool size so that it ends up being a nice size, and not something that either spans a little bit into a second page or doesn't fill a page nicely. On my x86-64 here, a 128 segment sg table is exactly one page (looking at slabinfo). It depends on the allocator whether that is just right, or just a little too much due to management information. > I do not see away out of this problem. I think that even with Jens's > chaining of sg_tables 128 is a magic number that we don't want to cross Label me skeptical of your findings so far :-) > It leaves me with option 3 on the bidi front. I think it would be best > to Just allocate another global mem_pool of scsi_data_buffer(s) just like > we do for scsi_io_context. The uni scsi_data_buffer will be embedded in > struct scsi_cmnd and the bidi one will be allocated from this pool. > I am open to any suggestions. To statements on this affair: - Any sg table size should work, provided that we can safely and reliably allocate it. We stay with 1 page max for that reason alone. - The max sg number has no magic beyond staying within a single page for allocation reasons. The scattertable really only exists as a temporary way to store and setup transfer, until it's mapped to a driver private sg structure. That is why the phys segments limit is purely a driver property, it has nothing to do with the hardware or the hardware limits. > If anyone wants to see I have done 2 versions of this work. One on top > of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup > git. both can be found here: > http://www.bhalevy.com/open-osd/download/scsi_sg_table/ +#define scsi_for_each_sg(cmd, sg, nseg, __i) \ + for (__i = 0, sg = scsi_sglist(cmd); __i < (nseg); __i++, (sg)++) Hmm? -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Boaz Harrosh wrote: > James Bottomley wrote: >> >> There's actually a fourth option you haven't considered: >> >> Roll all the required sglist definitions (request_bufflen, >> request_buffer, use_sg and sglist_len) into the sgtable pools. >> > This is a grate Idea. Let me see if I understand what you mean. > ... > ... Hi Dear James, list. I have worked on proposed solution (If anyone is interested see url blow) Now it works and all but I hit some problems. What happens is that in 64 bit architectures, well in x86_64, the sizeof scatterlist structure is 32 bytes which means that we can only fit exactly 128 of them in a page. But together with the scsi_sg_table header we are down to 127. Also if we want good alignment/packing of other sized pools we want: static struct scsi_host_sg_pool scsi_sg_pools[] = { SP(7), SP(15), SP(31), SP(63), SP(127) }; now there are 2 issues with this: 1. Even if I do blk_queue_max_phys_segments(q, 127); I still get requests for use_sg=128 which will crash the kernel. 2. If I do SPs of 7,15,31,63,128 or even 8,16,32,64,128 it will boot and work but clearly it does not fit in one page. So either my sata drivers and iSCSI, which I test, are not sensitive to scatterlists fit one page. Or kernel gives me 2 contiguous pages? I do not see away out of this problem. I think that even with Jens's chaining of sg_tables 128 is a magic number that we don't want to cross It leaves me with option 3 on the bidi front. I think it would be best to Just allocate another global mem_pool of scsi_data_buffer(s) just like we do for scsi_io_context. The uni scsi_data_buffer will be embedded in struct scsi_cmnd and the bidi one will be allocated from this pool. I am open to any suggestions. If anyone wants to see I have done 2 versions of this work. One on top of Jens's sg-chaining work (ver 5). And one on top of Tomo's cleanup git. both can be found here: http://www.bhalevy.com/open-osd/download/scsi_sg_table/ Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Thu, 2007-05-10 at 11:10 -0400, Douglas Gilbert wrote: > > +#define scsi_resid(cmd) ((cmd)->resid) > > I have defined resid in the past as a signed (32 bit) > integer following the CAM spec. The cases are: >- resid=0 : initiator's DMA engine got (or sent?) the >number of bytes it expected >- resid>0 : dxfer_len-resid bytes received, less than >expected >- resid<0 : more bytes received (or could have been) >than indicated by dxfer_len > > Some definitions of resid make it unsigned which rules out > the less common resid<0 case. Can this case happen? Does it > happen in practice? Naturally no more than dxfer_len should > be placed in the scatter gather list. Attempted overrun is usually, and correctly in my opinion, treated as a fatal error in most drivers, so for them the resid < 0 case can never happen. > SPI and SAS do not convey dxfer_len (or data direction) to > a target in their transport frames. This means that the > relevant field in the cdb (e.g. transfer length) dictates > how much data a target sends back to an initiator in the > case of a read like operation. So that opens up the > possibility that dxfer_len and the cdb disagree. Well, actually the drivers do (or those few that pay attention). dxfer_len is used to program the SG lists into the device DMA engine, so drivers can retrieve the actuals from the DMA engine at the end of the transfer and set resid accordingly. > FCP does convey dxfer_len and data_direction to a target. > So a FCP target can detect a mismatch between this and the > cdb and send resid information (either under or over) in its > response frame back to the initiator. Alternatively it > can treat the "over" case as an error (fcp3r04.pdf section > 9.4.2). > iSCSI and SRP? > > The resid<0 case may become more common if a driver or > application does not properly take account of protection > information being sent together with the requested > data. > > So should we keep resid signed? I don't really see a compelling reason to alter the driver behaviour (at least for those where overrun is fatal). However, since resid is signed in the current structure, it makes sense to propagate that. James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
Douglas Gilbert wrote: > Boaz Harrosh wrote: >> FUJITA Tomonori wrote: >>> From: FUJITA Tomonori <[EMAIL PROTECTED]> >>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>> Date: Thu, 10 May 2007 15:53:22 +0900 >>> >>>> From: Boaz Harrosh <[EMAIL PROTECTED]> >>>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>>> Date: Wed, 09 May 2007 19:54:32 +0300 >>>> >>>>> James Bottomley wrote: >>>>>> Actually, the first order of business is to use accessors on the command >>>>>> pointers in the drivers to free them from the internal layout of the >>>>>> structure (and where it is allocated). >>>>>> >>>>> Thanks! I totally second that. Let me look into my old patches and come >>>>> up with all the needed accessors. I hope 3-5 will be enough. >>>>> I will send some suggestions tomorrow. >>>>>> No, that's why you do the accessors. Convert all of the common drivers >>>>>> to accessors on the current structure, then throw the switch to convert >>>>>> to the new structure (whatever form is finally settled on). Then any >>>>>> unconverted drivers break and people fix the ones they care about. >>>>> Last time I was able to compile 97% of drivers and convert by >>>>> search-and-replace >>>>> the rest. Not a huge deal. >>>> We need to remove the non-use-sg code in the drivers and convert >>>> them. So it's a bit more complicated than search-and-replace. >>> Here's a patch to convert ibmvscsi to use helper functions (though it >>> needs more testings). >>> >>> --- >>> --- >>> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h >>> index d6948d0..799f204 100644 >>> --- a/include/scsi/scsi_cmnd.h >>> +++ b/include/scsi/scsi_cmnd.h >>> @@ -138,4 +138,17 @@ extern void scsi_kunmap_atomic_sg(void * >>> extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t); >>> extern void scsi_free_sgtable(struct scatterlist *, int); >>> >>> +extern int scsi_dma_map(struct device *dev, struct scsi_cmnd *cmd); >>> +extern void scsi_dma_unmap(struct device *dev, struct scsi_cmnd *cmd); >>> + >>> +/* moved to scatterlist.h after chaining sg */ >>> +#define sg_next(sg) ((sg) + 1) >>> + >>> +#define scsi_for_each_sg(cmd, nseg, i) >>> \ >>> + for (i = 0, sg = (cmd)->request_buffer; i < nseg; \ >>> + sg = sg_next(sg), i++) \ >>> + >> Better we do like Jens's patch >> +#define for_each_sg(sglist, sg, nr, __i)\ >> +for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg)) >> >> I think that we should wait for Jens pending patchset and do this work on top >> of his, then use his sg macros directly. This way the cleaners can also be >> watchfully for any drivers that might brake with big IO sent to them. >> >>> +#define scsi_sg_count(cmd) ((cmd)->use_sg) >>> +#define scsi_bufferlen(cmd) ((cmd)->request_bufflen) >>> + >>> #endif /* _SCSI_SCSI_CMND_H */ >> Above helpers look good. However I am missing 2 of them: >> >> 1. >> +#define scsi_sgl(cmd) ((struct scatterlist*)(cmd)->request_buffer) >> >> This is for drivers like iSCSI that do not do any dma mapping, as dma >> is done at the lower level in the NICs. And lots of other places that just >> transfer the sglist around. >> >> 2. >> +#define scsi_resid(cmd) ((cmd)->resid) > > I have defined resid in the past as a signed (32 bit) > integer following the CAM spec. The cases are: >- resid=0 : initiator's DMA engine got (or sent?) the >number of bytes it expected >- resid>0 : dxfer_len-resid bytes received, less than >expected >- resid<0 : more bytes received (or could have been) >than indicated by dxfer_len > > Some definitions of resid make it unsigned which rules out > the less common resid<0 case. Can this case happen? Does it > happen in practice? Naturally no more than dxfer_len should > be placed in the scatter gather list. > > SPI and SAS do not convey dxfer_len (or data direction) to > a target in their transport frames. This means that the > relevant field in the cdb (e.g. transfer length) dictates > how much data a targ
Re: [PATCH v2] add bidi support for block pc requests
Boaz Harrosh wrote: > FUJITA Tomonori wrote: >> From: FUJITA Tomonori <[EMAIL PROTECTED]> >> Subject: Re: [PATCH v2] add bidi support for block pc requests >> Date: Thu, 10 May 2007 15:53:22 +0900 >> >>> From: Boaz Harrosh <[EMAIL PROTECTED]> >>> Subject: Re: [PATCH v2] add bidi support for block pc requests >>> Date: Wed, 09 May 2007 19:54:32 +0300 >>> >>>> James Bottomley wrote: >>>>> Actually, the first order of business is to use accessors on the command >>>>> pointers in the drivers to free them from the internal layout of the >>>>> structure (and where it is allocated). >>>>> >>>> Thanks! I totally second that. Let me look into my old patches and come >>>> up with all the needed accessors. I hope 3-5 will be enough. >>>> I will send some suggestions tomorrow. >>>>> No, that's why you do the accessors. Convert all of the common drivers >>>>> to accessors on the current structure, then throw the switch to convert >>>>> to the new structure (whatever form is finally settled on). Then any >>>>> unconverted drivers break and people fix the ones they care about. >>>> Last time I was able to compile 97% of drivers and convert by >>>> search-and-replace >>>> the rest. Not a huge deal. >>> We need to remove the non-use-sg code in the drivers and convert >>> them. So it's a bit more complicated than search-and-replace. >> Here's a patch to convert ibmvscsi to use helper functions (though it >> needs more testings). >> >> --- >> --- >> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h >> index d6948d0..799f204 100644 >> --- a/include/scsi/scsi_cmnd.h >> +++ b/include/scsi/scsi_cmnd.h >> @@ -138,4 +138,17 @@ extern void scsi_kunmap_atomic_sg(void * >> extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t); >> extern void scsi_free_sgtable(struct scatterlist *, int); >> >> +extern int scsi_dma_map(struct device *dev, struct scsi_cmnd *cmd); >> +extern void scsi_dma_unmap(struct device *dev, struct scsi_cmnd *cmd); >> + >> +/* moved to scatterlist.h after chaining sg */ >> +#define sg_next(sg) ((sg) + 1) >> + >> +#define scsi_for_each_sg(cmd, nseg, i) >> \ >> +for (i = 0, sg = (cmd)->request_buffer; i < nseg; \ >> +sg = sg_next(sg), i++) \ >> + > > Better we do like Jens's patch > +#define for_each_sg(sglist, sg, nr, __i) \ > + for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg)) > > I think that we should wait for Jens pending patchset and do this work on top > of his, then use his sg macros directly. This way the cleaners can also be > watchfully for any drivers that might brake with big IO sent to them. > >> +#define scsi_sg_count(cmd) ((cmd)->use_sg) >> +#define scsi_bufferlen(cmd) ((cmd)->request_bufflen) >> + >> #endif /* _SCSI_SCSI_CMND_H */ > > Above helpers look good. However I am missing 2 of them: > > 1. > +#define scsi_sgl(cmd) ((struct scatterlist*)(cmd)->request_buffer) > > This is for drivers like iSCSI that do not do any dma mapping, as dma > is done at the lower level in the NICs. And lots of other places that just > transfer the sglist around. > > 2. > +#define scsi_resid(cmd) ((cmd)->resid) I have defined resid in the past as a signed (32 bit) integer following the CAM spec. The cases are: - resid=0 : initiator's DMA engine got (or sent?) the number of bytes it expected - resid>0 : dxfer_len-resid bytes received, less than expected - resid<0 : more bytes received (or could have been) than indicated by dxfer_len Some definitions of resid make it unsigned which rules out the less common resid<0 case. Can this case happen? Does it happen in practice? Naturally no more than dxfer_len should be placed in the scatter gather list. SPI and SAS do not convey dxfer_len (or data direction) to a target in their transport frames. This means that the relevant field in the cdb (e.g. transfer length) dictates how much data a target sends back to an initiator in the case of a read like operation. So that opens up the possibility that dxfer_len and the cdb disagree. FCP does convey dxfer_len and data_direction to a target. So a FCP target can detect a mismatch between this and the cdb and send resid information (either under or over) in its response frame back to the initiator. Alternatively it can treat the "over" case as an error (fcp3r04.pdf section 9.4.2). iSCSI and SRP? The resid<0 case may become more common if a driver or application does not properly take account of protection information being sent together with the requested data. So should we keep resid signed? Doug Gilbert - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 10 May 2007 15:37:48 +0300 > > +/* moved to scatterlist.h after chaining sg */ > > +#define sg_next(sg) ((sg) + 1) > > + > > +#define scsi_for_each_sg(cmd, nseg, i) > > \ > > + for (i = 0, sg = (cmd)->request_buffer; i < nseg; \ > > + sg = sg_next(sg), i++) \ > > + > > Better we do like Jens's patch > +#define for_each_sg(sglist, sg, nr, __i) \ > + for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg)) > > I think that we should wait for Jens pending patchset and do this work on top > of his, then use his sg macros directly. This way the cleaners can also be > watchfully for any drivers that might brake with big IO sent to them. Yeah, we can use sg list's macro something like: #define scsi_for_each_sg(cmd, sg, nseg, __i)\ for_each_sg((cmd)->request_buffer, sg, nseg, __i) Seems that Jens converted lots of LLDs to use for_each_sg. If for_each_sg goes first, we can just replase for_each_sg (but after all, we need to touch LLDs to remove the non-use-sg path). On the other hand, if we go first, there is no conversion for for_each_sg in LLDs. Jens, do you want the sg list stuff done before our cleanups? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: FUJITA Tomonori <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Thu, 10 May 2007 15:53:22 +0900 > >> From: Boaz Harrosh <[EMAIL PROTECTED]> >> Subject: Re: [PATCH v2] add bidi support for block pc requests >> Date: Wed, 09 May 2007 19:54:32 +0300 >> >>> James Bottomley wrote: >>>> Actually, the first order of business is to use accessors on the command >>>> pointers in the drivers to free them from the internal layout of the >>>> structure (and where it is allocated). >>>> >>> Thanks! I totally second that. Let me look into my old patches and come >>> up with all the needed accessors. I hope 3-5 will be enough. >>> I will send some suggestions tomorrow. >>>> No, that's why you do the accessors. Convert all of the common drivers >>>> to accessors on the current structure, then throw the switch to convert >>>> to the new structure (whatever form is finally settled on). Then any >>>> unconverted drivers break and people fix the ones they care about. >>> Last time I was able to compile 97% of drivers and convert by >>> search-and-replace >>> the rest. Not a huge deal. >> We need to remove the non-use-sg code in the drivers and convert >> them. So it's a bit more complicated than search-and-replace. > > Here's a patch to convert ibmvscsi to use helper functions (though it > needs more testings). > > --- > --- > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h > index d6948d0..799f204 100644 > --- a/include/scsi/scsi_cmnd.h > +++ b/include/scsi/scsi_cmnd.h > @@ -138,4 +138,17 @@ extern void scsi_kunmap_atomic_sg(void * > extern struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *, gfp_t); > extern void scsi_free_sgtable(struct scatterlist *, int); > > +extern int scsi_dma_map(struct device *dev, struct scsi_cmnd *cmd); > +extern void scsi_dma_unmap(struct device *dev, struct scsi_cmnd *cmd); > + > +/* moved to scatterlist.h after chaining sg */ > +#define sg_next(sg) ((sg) + 1) > + > +#define scsi_for_each_sg(cmd, nseg, i) > \ > + for (i = 0, sg = (cmd)->request_buffer; i < nseg; \ > + sg = sg_next(sg), i++) \ > + Better we do like Jens's patch +#define for_each_sg(sglist, sg, nr, __i) \ + for (__i = 0, sg = (sglist); __i < (nr); __i++, sg = sg_next(sg)) I think that we should wait for Jens pending patchset and do this work on top of his, then use his sg macros directly. This way the cleaners can also be watchfully for any drivers that might brake with big IO sent to them. > +#define scsi_sg_count(cmd) ((cmd)->use_sg) > +#define scsi_bufferlen(cmd) ((cmd)->request_bufflen) > + > #endif /* _SCSI_SCSI_CMND_H */ Above helpers look good. However I am missing 2 of them: 1. +#define scsi_sgl(cmd) ((struct scatterlist*)(cmd)->request_buffer) This is for drivers like iSCSI that do not do any dma mapping, as dma is done at the lower level in the NICs. And lots of other places that just transfer the sglist around. 2. +#define scsi_resid(cmd) ((cmd)->resid) Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: FUJITA Tomonori <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 10 May 2007 15:53:22 +0900 > From: Boaz Harrosh <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Wed, 09 May 2007 19:54:32 +0300 > > > James Bottomley wrote: > > > Actually, the first order of business is to use accessors on the command > > > pointers in the drivers to free them from the internal layout of the > > > structure (and where it is allocated). > > > > > Thanks! I totally second that. Let me look into my old patches and come > > up with all the needed accessors. I hope 3-5 will be enough. > > I will send some suggestions tomorrow. > > > > > > No, that's why you do the accessors. Convert all of the common drivers > > > to accessors on the current structure, then throw the switch to convert > > > to the new structure (whatever form is finally settled on). Then any > > > unconverted drivers break and people fix the ones they care about. > > > > Last time I was able to compile 97% of drivers and convert by > > search-and-replace > > the rest. Not a huge deal. > > We need to remove the non-use-sg code in the drivers and convert > them. So it's a bit more complicated than search-and-replace. Here's a patch to convert ibmvscsi to use helper functions (though it needs more testings). --- diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c index fbc1d5c..fb764ff 100644 --- a/drivers/scsi/ibmvscsi/ibmvscsi.c +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c @@ -353,20 +353,18 @@ static void unmap_cmd_data(struct srp_cm } } -static int map_sg_list(int num_entries, - struct scatterlist *sg, - struct srp_direct_buf *md) +static int map_sg_list(struct scsi_cmnd *cmd, int nseg, struct srp_direct_buf *md) { int i; + struct scatterlist *sg; u64 total_length = 0; - for (i = 0; i < num_entries; ++i) { + scsi_for_each_sg(cmd, nseg, i) { struct srp_direct_buf *descr = md + i; - struct scatterlist *sg_entry = &sg[i]; - descr->va = sg_dma_address(sg_entry); - descr->len = sg_dma_len(sg_entry); + descr->va = sg_dma_address(sg); + descr->len = sg_dma_len(sg); descr->key = 0; - total_length += sg_dma_len(sg_entry); + total_length += sg_dma_len(sg); } return total_length; } @@ -384,43 +382,39 @@ static int map_sg_data(struct scsi_cmnd struct srp_event_struct *evt_struct, struct srp_cmd *srp_cmd, struct device *dev) { - int sg_mapped; u64 total_length = 0; - struct scatterlist *sg = cmd->request_buffer; struct srp_direct_buf *data = (struct srp_direct_buf *) srp_cmd->add_data; struct srp_indirect_buf *indirect = (struct srp_indirect_buf *) data; - sg_mapped = dma_map_sg(dev, sg, cmd->use_sg, DMA_BIDIRECTIONAL); - - if (sg_mapped == 0) + sg_mapped = scsi_dma_map(dev, cmd); + if (!sg_mapped) + return 1; + else if (sg_mapped < 0) return 0; + else if (sg_mapped > SG_ALL) { + printk(KERN_ERR + "ibmvscsi: More than %d mapped sg entries, got %d\n", + SG_ALL, sg_mapped); + return 0; + } set_srp_direction(cmd, srp_cmd, sg_mapped); /* special case; we can use a single direct descriptor */ if (sg_mapped == 1) { - data->va = sg_dma_address(&sg[0]); - data->len = sg_dma_len(&sg[0]); - data->key = 0; + map_sg_list(cmd, sg_mapped, data); return 1; } - if (sg_mapped > SG_ALL) { - printk(KERN_ERR - "ibmvscsi: More than %d mapped sg entries, got %d\n", - SG_ALL, sg_mapped); - return 0; - } - indirect->table_desc.va = 0; indirect->table_desc.len = sg_mapped * sizeof(struct srp_direct_buf); indirect->table_desc.key = 0; if (sg_mapped <= MAX_INDIRECT_BUFS) { - total_length = map_sg_list(sg_mapped, sg, + total_length = map_sg_list(cmd, sg_mapped, &indirect->desc_list[0]); indirect->len = total_length; return 1; @@ -429,61 +423,27 @@ static int map_sg_data(struct scsi_cmnd /* get indirect table */ if (!evt_struct->ext_list) { evt_s
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Wed, 09 May 2007 19:54:32 +0300 > James Bottomley wrote: > > Actually, the first order of business is to use accessors on the command > > pointers in the drivers to free them from the internal layout of the > > structure (and where it is allocated). > > > Thanks! I totally second that. Let me look into my old patches and come > up with all the needed accessors. I hope 3-5 will be enough. > I will send some suggestions tomorrow. > > > > No, that's why you do the accessors. Convert all of the common drivers > > to accessors on the current structure, then throw the switch to convert > > to the new structure (whatever form is finally settled on). Then any > > unconverted drivers break and people fix the ones they care about. > > Last time I was able to compile 97% of drivers and convert by > search-and-replace > the rest. Not a huge deal. We need to remove the non-use-sg code in the drivers and convert them. So it's a bit more complicated than search-and-replace. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: > Actually, the first order of business is to use accessors on the command > pointers in the drivers to free them from the internal layout of the > structure (and where it is allocated). > Thanks! I totally second that. Let me look into my old patches and come up with all the needed accessors. I hope 3-5 will be enough. I will send some suggestions tomorrow. > > No, that's why you do the accessors. Convert all of the common drivers > to accessors on the current structure, then throw the switch to convert > to the new structure (whatever form is finally settled on). Then any > unconverted drivers break and people fix the ones they care about. Last time I was able to compile 97% of drivers and convert by search-and-replace the rest. Not a huge deal. > > James > > - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Wed, 09 May 2007 16:58:24 +0300 > FUJITA Tomonori wrote: > > From: Boaz Harrosh <[EMAIL PROTECTED]> > > Subject: Re: [PATCH v2] add bidi support for block pc requests > > Date: Wed, 09 May 2007 10:46:34 +0300 > > > >>> Roll all the required sglist definitions (request_bufflen, > >>> request_buffer, use_sg and sglist_len) into the sgtable pools. > >>> > >>> We're getting very close to the point where someone gets to sweep > >>> through the drivers eliminating the now superfluous non-sg path in the > >>> queuecommand. When that happens the only cases become no transfer or SG > >>> backed commands. At this point we can do a consolidation of the struct > >>> scsi_cmnd fields. This does represent the ideal time to sweep the sg > >>> list handling fields into the sgtable and simply have a single pointer > >>> to struct sgtable in the scsi_cmnd (== NULL is the signal for a no > >>> transfer command). > >>> > >> This is a grate Idea. Let me see if I understand what you mean. > >> 1. An sgtable is a single allocation with an sgtable header type > >>at the begining and a veriable size array of struct scatterlist. > >>something like: > >>struct sgtable { > >>struct sgtable_header { > >>unsigned sg_count, sglist_len, length; > >>struct sgtable* next; //for Jens's big io > >>} hdr; > >>struct scatterlist sglist[]; > >>} > > > > Can we have more simple sgtable? > > > > struct sgtable { > > unsigned use_sg; > > unsigned length; > > unsigned sglist_len; > > struct scatterlist sglist[0]; > > }; > > > Yes sure. It was just an example. > One comment, use_sg => sg_count. > By the way I have forgotten some fields so it should be: > > struct sgtable { > unsigned short sg_count; > unsigned short sg_pool; /* note that this is the pool index and not the > actual count */ > unsigned length; > unsigned resid; > struct scatterlist sglist[0]; > }; > > resid/length together with request->data_len can be optimized, this is the > current system. > > > > > Then we could do something like this: > > > > struct scsi_host_sgtable_pool { > > size_t size; > > char *name; > > struct kmem_cache *slab; > > mempool_t *pool; > > }; > > > > int __init scsi_init_queue(void) > > { > > for (i = 0; i < SG_MEMPOOL_NR; i++) { > > struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i; > > int size = sizeof(struct sgtable) + sgp->size * sizeof(struct > > scatterlist); > > > > sgp->slab = kmem_cache_create(sgp->name, size, 0, > > SLAB_HWCACHE_ALIGN, NULL, NULL); > > if (!sgp->slab) { > > printk(KERN_ERR "SCSI: can't init sg slab %s\n", > > sgp->name); > > } > > > > sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE, > > sgp->slab); > > > > > I think we can do a better job here by fitting exactly the number of > scatterlist > entries that will take up full pages including the header. This is sizes > dependent and can be compile-time calculated. For example in x86_64, with > header, > 145 scatterlist will fit in a page so this is kind of magic number for this > arch. > > could someone explain why we need scatterlist-max-count a base-2 number? To let the slab allocater to do better job? We can improve these issues later without disturbing LLDs. No need to do this right now. > > Jens' chaining sg lists adds sg->next so we don't need > > sgtable->next. We can just add __use_sg to struct sgtable. > > > Yes but it uses the last struct scatterlist for the ->next, this way > it is saved. On the other hand it wastes a pointer in small IOs. So > I guess this is Jens's call. If the "->next" in both cases will > point to a struct sgtable above I think that some non scsi stuff also need chaining sg. so sg needs a chaining scheme; sg->next, crypto layer's hack (not to add sg->next), etc. Instead of using sg's chaining, having stable->next is wrong. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Wed, 2007-05-09 at 16:58 +0300, Boaz Harrosh wrote: > >> 1. An sgtable is a single allocation with an sgtable header type > >>at the begining and a veriable size array of struct scatterlist. > >>something like: > >>struct sgtable { > >>struct sgtable_header { > >>unsigned sg_count, sglist_len, length; > >>struct sgtable* next; //for Jens's big io > >>} hdr; > >>struct scatterlist sglist[]; > >>} > > > > Can we have more simple sgtable? > > > > struct sgtable { > > unsigned use_sg; > > unsigned length; > > unsigned sglist_len; > > struct scatterlist sglist[0]; > > }; > > > Yes sure. It was just an example. > One comment, use_sg => sg_count. > By the way I have forgotten some fields so it should be: > > struct sgtable { > unsigned short sg_count; > unsigned short sg_pool; /* note that this is the pool index and not the > actual count */ > unsigned length; > unsigned resid; > struct scatterlist sglist[0]; > }; > > resid/length together with request->data_len can be optimized, this is the > current system. Actually, the first order of business is to use accessors on the command pointers in the drivers to free them from the internal layout of the structure (and where it is allocated). > > > > Then we could do something like this: > > > > struct scsi_host_sgtable_pool { > > size_t size; > > char *name; > > struct kmem_cache *slab; > > mempool_t *pool; > > }; > > > > int __init scsi_init_queue(void) > > { > > for (i = 0; i < SG_MEMPOOL_NR; i++) { > > struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i; > > int size = sizeof(struct sgtable) + sgp->size * sizeof(struct > > scatterlist); > > > > sgp->slab = kmem_cache_create(sgp->name, size, 0, > > SLAB_HWCACHE_ALIGN, NULL, NULL); > > if (!sgp->slab) { > > printk(KERN_ERR "SCSI: can't init sg slab %s\n", > > sgp->name); > > } > > > > sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE, > > sgp->slab); > > > > > I think we can do a better job here by fitting exactly the number of > scatterlist > entries that will take up full pages including the header. This is sizes > dependent and can be compile-time calculated. For example in x86_64, with > header, > 145 scatterlist will fit in a page so this is kind of magic number for this > arch. > > could someone explain why we need scatterlist-max-count a base-2 number? Power of 2 is the most efficient way to manage space vs pool number in an arbitrarily sized system. This, too, can be accessor abstracted ... > > Jens' chaining sg lists adds sg->next so we don't need > > sgtable->next. We can just add __use_sg to struct sgtable. > > > Yes but it uses the last struct scatterlist for the ->next, this way it is > saved. > On the other hand it wastes a pointer in small IOs. So I guess this is Jens's > call. > If the "->next" in both cases will point to a struct sgtable above than we > don't need > __use_sg since sg_pool(sglist_len) holds the pool this came from. If not > __use_sg > must be added to struct sgtable. > > It looks like proposed change races with Jens's work so it's better be done > after > his. It could be nice if he incorporates some of the constrains from here > before > hand. > > > > >> 2. The way we can do this in stages: Meaning put up code that has > >>both sets of API, Transfer drivers one-by-one to new API, deprecate > >>old API for a kernel cycle or two. Than submit last piece of > >>code that removes the old API. > >>It can be done. We just need to copy sgtable_header fields > >>to the old fields, and let them stick around for a while. > > > > That's not bad, but can we convert all the LLDs all at once? No, that's why you do the accessors. Convert all of the common drivers to accessors on the current structure, then throw the switch to convert to the new structure (whatever form is finally settled on). Then any unconverted drivers break and people fix the ones they care about. > > The changes to scsi mid-layer are almost done. Probabaly, you can just > > add sgtable stuff (and scsi_dma_map/scsi_dma_unmap helper functions > > that Christoph proposed long ago) to my previous patch. It can be done > > for several hours. So you have enough time for the LLDs' changes. > I am here to do any work needed. I will wait to see what is decided. > > I already have a 2.6.20 tree with all llds converted to things like > "scsi_uni(cmd)->sg" where scsi_uni() was pointing to: > struct scsi_data_buffer { > unsigned short sg_count; > unsigned short sg_pool; /* note that this is the pool index and not the > actual count */ > unsigned length; > unsigned resid; > struct scatterlist *sg; > }; > a
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > From: Boaz Harrosh <[EMAIL PROTECTED]> > Subject: Re: [PATCH v2] add bidi support for block pc requests > Date: Wed, 09 May 2007 10:46:34 +0300 > >>> Roll all the required sglist definitions (request_bufflen, >>> request_buffer, use_sg and sglist_len) into the sgtable pools. >>> >>> We're getting very close to the point where someone gets to sweep >>> through the drivers eliminating the now superfluous non-sg path in the >>> queuecommand. When that happens the only cases become no transfer or SG >>> backed commands. At this point we can do a consolidation of the struct >>> scsi_cmnd fields. This does represent the ideal time to sweep the sg >>> list handling fields into the sgtable and simply have a single pointer >>> to struct sgtable in the scsi_cmnd (== NULL is the signal for a no >>> transfer command). >>> >> This is a grate Idea. Let me see if I understand what you mean. >> 1. An sgtable is a single allocation with an sgtable header type >>at the begining and a veriable size array of struct scatterlist. >>something like: >>struct sgtable { >> struct sgtable_header { >> unsigned sg_count, sglist_len, length; >> struct sgtable* next; //for Jens's big io >> } hdr; >> struct scatterlist sglist[]; >>} > > Can we have more simple sgtable? > > struct sgtable { > unsigned use_sg; > unsigned length; > unsigned sglist_len; > struct scatterlist sglist[0]; > }; > Yes sure. It was just an example. One comment, use_sg => sg_count. By the way I have forgotten some fields so it should be: struct sgtable { unsigned short sg_count; unsigned short sg_pool; /* note that this is the pool index and not the actual count */ unsigned length; unsigned resid; struct scatterlist sglist[0]; }; resid/length together with request->data_len can be optimized, this is the current system. > > Then we could do something like this: > > struct scsi_host_sgtable_pool { > size_t size; > char *name; > struct kmem_cache *slab; > mempool_t *pool; > }; > > int __init scsi_init_queue(void) > { > for (i = 0; i < SG_MEMPOOL_NR; i++) { > struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i; > int size = sizeof(struct sgtable) + sgp->size * sizeof(struct > scatterlist); > > sgp->slab = kmem_cache_create(sgp->name, size, 0, > SLAB_HWCACHE_ALIGN, NULL, NULL); > if (!sgp->slab) { > printk(KERN_ERR "SCSI: can't init sg slab %s\n", > sgp->name); > } > > sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE, >sgp->slab); > > I think we can do a better job here by fitting exactly the number of scatterlist entries that will take up full pages including the header. This is sizes dependent and can be compile-time calculated. For example in x86_64, with header, 145 scatterlist will fit in a page so this is kind of magic number for this arch. could someone explain why we need scatterlist-max-count a base-2 number? > Jens' chaining sg lists adds sg->next so we don't need > sgtable->next. We can just add __use_sg to struct sgtable. > Yes but it uses the last struct scatterlist for the ->next, this way it is saved. On the other hand it wastes a pointer in small IOs. So I guess this is Jens's call. If the "->next" in both cases will point to a struct sgtable above than we don't need __use_sg since sg_pool(sglist_len) holds the pool this came from. If not __use_sg must be added to struct sgtable. It looks like proposed change races with Jens's work so it's better be done after his. It could be nice if he incorporates some of the constrains from here before hand. > >> 2. The way we can do this in stages: Meaning put up code that has >>both sets of API, Transfer drivers one-by-one to new API, deprecate >>old API for a kernel cycle or two. Than submit last piece of >>code that removes the old API. >>It can be done. We just need to copy sgtable_header fields >>to the old fields, and let them stick around for a while. > > That's not bad, but can we convert all the LLDs all at once? > > The changes to scsi mid-layer are almost done. Probabaly, you can just > add sgtable stuff (and scsi_dma_map/scsi_dma_unmap helper functions >
Re: [PATCH v2] add bidi support for block pc requests
From: Boaz Harrosh <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Wed, 09 May 2007 10:46:34 +0300 > > Roll all the required sglist definitions (request_bufflen, > > request_buffer, use_sg and sglist_len) into the sgtable pools. > > > > We're getting very close to the point where someone gets to sweep > > through the drivers eliminating the now superfluous non-sg path in the > > queuecommand. When that happens the only cases become no transfer or SG > > backed commands. At this point we can do a consolidation of the struct > > scsi_cmnd fields. This does represent the ideal time to sweep the sg > > list handling fields into the sgtable and simply have a single pointer > > to struct sgtable in the scsi_cmnd (== NULL is the signal for a no > > transfer command). > > > This is a grate Idea. Let me see if I understand what you mean. > 1. An sgtable is a single allocation with an sgtable header type >at the begining and a veriable size array of struct scatterlist. >something like: >struct sgtable { > struct sgtable_header { > unsigned sg_count, sglist_len, length; > struct sgtable* next; //for Jens's big io > } hdr; > struct scatterlist sglist[]; >} Can we have more simple sgtable? struct sgtable { unsigned use_sg; unsigned length; unsigned sglist_len; struct scatterlist sglist[0]; }; Then we could do something like this: struct scsi_host_sgtable_pool { size_t size; char *name; struct kmem_cache *slab; mempool_t *pool; }; int __init scsi_init_queue(void) { for (i = 0; i < SG_MEMPOOL_NR; i++) { struct scsi_host_sgtable_pool *sgtbp = scsi_sgtable_pools + i; int size = sizeof(struct sgtable) + sgp->size * sizeof(struct scatterlist); sgp->slab = kmem_cache_create(sgp->name, size, 0, SLAB_HWCACHE_ALIGN, NULL, NULL); if (!sgp->slab) { printk(KERN_ERR "SCSI: can't init sg slab %s\n", sgp->name); } sgp->pool = mempool_create_slab_pool(SG_MEMPOOL_SIZE, sgp->slab); Jens' chaining sg lists adds sg->next so we don't need sgtable->next. We can just add __use_sg to struct sgtable. > 2. The way we can do this in stages: Meaning put up code that has >both sets of API, Transfer drivers one-by-one to new API, deprecate >old API for a kernel cycle or two. Than submit last piece of >code that removes the old API. >It can be done. We just need to copy sgtable_header fields >to the old fields, and let them stick around for a while. That's not bad, but can we convert all the LLDs all at once? The changes to scsi mid-layer are almost done. Probabaly, you can just add sgtable stuff (and scsi_dma_map/scsi_dma_unmap helper functions that Christoph proposed long ago) to my previous patch. It can be done for several hours. So you have enough time for the LLDs' changes. > 3. The second bidi sgtable will hang on request->next_rq->special. I think so. - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
From: James Bottomley <[EMAIL PROTECTED]> Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Tue, 08 May 2007 15:01:37 -0500 > Roll all the required sglist definitions (request_bufflen, > request_buffer, use_sg and sglist_len) into the sgtable pools. > > We're getting very close to the point where someone gets to sweep > through the drivers eliminating the now superfluous non-sg path in the > queuecommand. The non-use-sg case is still alive? - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: > I think you'll find that kzalloc comes directly out of a slab for this > size of allocation anyway ... you mean you want to see a dedicated pool > for this specific allocation? Yes, As you said below so we can always send IO for "forward progress of freeing memory". My test machine is a Linux cluster in front of a pNFS over OSD. The HPC cluster is diskless. It will reach this situation very fast. > There's another problem in that it destroys our forward progress > guarantee. There's always a single reserve command for every HBA so > that forward progress for freeing memory can always be made in the > system even if the command slab is out and we have to reclaim memory > through a HBA with no outstanding commands. Allocating two commands per > bidirectional request hoses that guarantee ... it could be fixed up by > increasing the reserve pool to 2, but that's adding further unwanted > complexity ... > Thanks for catching it! I was afraid of that. If we stick with this solution in the interim until we do what you suggested below, we will need to put one more for bidi. It should not be a complicated pool thing, just a reserved one for the bidi case. > > There's actually a fourth option you haven't considered: > > Roll all the required sglist definitions (request_bufflen, > request_buffer, use_sg and sglist_len) into the sgtable pools. > > We're getting very close to the point where someone gets to sweep > through the drivers eliminating the now superfluous non-sg path in the > queuecommand. When that happens the only cases become no transfer or SG > backed commands. At this point we can do a consolidation of the struct > scsi_cmnd fields. This does represent the ideal time to sweep the sg > list handling fields into the sgtable and simply have a single pointer > to struct sgtable in the scsi_cmnd (== NULL is the signal for a no > transfer command). > This is a grate Idea. Let me see if I understand what you mean. 1. An sgtable is a single allocation with an sgtable header type at the begining and a veriable size array of struct scatterlist. something like: struct sgtable { struct sgtable_header { unsigned sg_count, sglist_len, length; struct sgtable* next; //for Jens's big io } hdr; struct scatterlist sglist[]; } Slabs are put up for above sgtable of different sizes as done today. (Should they be sized on different ARCHs to align on page boundaries?) 2. The way we can do this in stages: Meaning put up code that has both sets of API, Transfer drivers one-by-one to new API, deprecate old API for a kernel cycle or two. Than submit last piece of code that removes the old API. It can be done. We just need to copy sgtable_header fields to the old fields, and let them stick around for a while. 3. The second bidi sgtable will hang on request->next_rq->special. > James > > If everyone agrees on something like above. I can do it right away. It's a solution I wouldn't even dream of. Thanks Boaz - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Tue, 2007-05-08 at 21:53 +0300, Boaz Harrosh wrote: > Before I get to my main concern here I have one comment. in > scsi_get_cmd_from_req() > there is a code path in which a scsi_cmnd is taken from special and is not > newly > allocated. It is best to move bidi allocation to scsi_get_cmd_from_req() and > allocate > new sdb only if we get a new Command. (See my attached patch for an example) > > OK! My main concern is with kzalloc(sizeof(*sdb), GFP_ATOMIC); > All IO allocations should come from slabs in case of a memory starved system. I think you'll find that kzalloc comes directly out of a slab for this size of allocation anyway ... you mean you want to see a dedicated pool for this specific allocation? > There are 3 possible solutions I see: > 1. Use struct scsi_cmnd for bidi_read and not a special scsi_data_buffer. >Attached is above solution. It is by far the simplest of the three. >So simple that even Jens's BigIO patch can almost apply at scsi_lib.c. > (and vise versa) >What's hanged on the request->next_rq->special is a second scsi_cmnd. >The command is taken from regular command pool. This solution, though >wasteful of some memory is very stable. There's another problem in that it destroys our forward progress guarantee. There's always a single reserve command for every HBA so that forward progress for freeing memory can always be made in the system even if the command slab is out and we have to reclaim memory through a HBA with no outstanding commands. Allocating two commands per bidirectional request hoses that guarantee ... it could be fixed up by increasing the reserve pool to 2, but that's adding further unwanted complexity ... > 2. Force the users of BIDI to allocate scsi_data_buffer(s) >Users will allocate a slab for scsi_data_buffers and hang them on > nex_rq->special before >hand. Than free them together with second request. This approach can be > very stable, But >it is bad because it is a layering violation. When block and SCSI layers > decide to change >the wiring of bidi. Users will have to change with them. I Agree. > 3. Let SCSI-ml manage a slab for scsi_data_buff's >Have a flag to scsi_setup_command_freelist() or a new API. When a device > is flagged >bidi-able we could 1-Allocate bigger slots to have extra memory for SDBs > together >with the command itself. or 2-allocate yet another slab for SDBs. if you're worried about allocations, doing a single allocate is better > The 3rd approach is clean, and I can submit a patch for it. But I think it is > not currently > necessary. The first solution (See attached patch) we can all live with, I > hope. Since for > me it gives me the stability I need. And for the rest of the kernel it is the > minimum > change, layered on existing building blocks. There's actually a fourth option you haven't considered: Roll all the required sglist definitions (request_bufflen, request_buffer, use_sg and sglist_len) into the sgtable pools. We're getting very close to the point where someone gets to sweep through the drivers eliminating the now superfluous non-sg path in the queuecommand. When that happens the only cases become no transfer or SG backed commands. At this point we can do a consolidation of the struct scsi_cmnd fields. This does represent the ideal time to sweep the sg list handling fields into the sgtable and simply have a single pointer to struct sgtable in the scsi_cmnd (== NULL is the signal for a no transfer command). James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
FUJITA Tomonori wrote: > Here is an updated version of the patch to add bidi support to block > pc requests. Bugs spotted by Benny were fixed. > > This patch can be applied cleanly to the scsi-misc git tree and is on > the top of the following patch to add linked request support: > > http://marc.info/?l=linux-scsi&m=117835587615642&w=2 > > --- > From: FUJITA Tomonori <[EMAIL PROTECTED]> > Date: Mon, 7 May 2007 16:42:24 +0900 > Subject: [PATCH] add bidi support to scsi pc commands > > This adds bidi support to block pc requests. > > A bidi request uses req->next_rq pointer for an in request. > > This patch introduces a new structure, scsi_data_buffer to hold the > data buffer information. To avoid make scsi_cmnd structure fatter, the > scsi mid-layer uses cmnd->request->next_rq->special pointer for > a scsi_data_buffer structure. LLDs don't touch the second request > (req->next_rq) so it's safe to use req->special. > > scsi_blk_pc_done() always completes the whole command so > scsi_end_request() simply completes the bidi chunk too. > > A helper function, scsi_bidi_data_buffer() is for LLDs to access to > the scsi_data_buffer structure easily. > > Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> > --- > @@ -685,6 +696,14 @@ static struct scsi_cmnd *scsi_end_request(struct > scsi_cmnd *cmd, int uptodate, > } > } > > + /* > + * a REQ_BLOCK_PC command is always completed fully so just do > + * end the bidi chunk. > + */ > + if (sdb) > + end_that_request_chunk(req->next_rq, uptodate, > +sdb->request_bufflen); > + sdb->request_bufflen was zeroed in scsi_release_buffers which is called before scsi_end_request. > static void scsi_blk_pc_done(struct scsi_cmnd *cmd) > { > BUG_ON(!blk_pc_request(cmd->request)); > @@ -1090,10 +1159,22 @@ static void scsi_blk_pc_done(struct scsi_cmnd *cmd) > static int scsi_setup_blk_pc_cmnd(struct scsi_device *sdev, struct request > *req) > { > struct scsi_cmnd *cmd; > + struct scsi_data_buffer *sdb = NULL; > + > + if (blk_bidi_rq(req)) { > + sdb = kzalloc(sizeof(*sdb), GFP_ATOMIC); > + if (unlikely(!sdb)) > + return BLKPREP_DEFER; > + } > > cmd = scsi_get_cmd_from_req(sdev, req); > - if (unlikely(!cmd)) > + if (unlikely(!cmd)) { > + kfree(sdb); > return BLKPREP_DEFER; > + } > + > + if (sdb) > + req->next_rq->special = sdb; > > /* >* BLOCK_PC requests may transfer data, in which case they must Before I get to my main concern here I have one comment. in scsi_get_cmd_from_req() there is a code path in which a scsi_cmnd is taken from special and is not newly allocated. It is best to move bidi allocation to scsi_get_cmd_from_req() and allocate new sdb only if we get a new Command. (See my attached patch for an example) OK! My main concern is with kzalloc(sizeof(*sdb), GFP_ATOMIC); All IO allocations should come from slabs in case of a memory starved system. There are 3 possible solutions I see: 1. Use struct scsi_cmnd for bidi_read and not a special scsi_data_buffer. Attached is above solution. It is by far the simplest of the three. So simple that even Jens's BigIO patch can almost apply at scsi_lib.c. (and vise versa) What's hanged on the request->next_rq->special is a second scsi_cmnd. The command is taken from regular command pool. This solution, though wasteful of some memory is very stable. 2. Force the users of BIDI to allocate scsi_data_buffer(s) Users will allocate a slab for scsi_data_buffers and hang them on nex_rq->special before hand. Than free them together with second request. This approach can be very stable, But it is bad because it is a layering violation. When block and SCSI layers decide to change the wiring of bidi. Users will have to change with them. 3. Let SCSI-ml manage a slab for scsi_data_buff's Have a flag to scsi_setup_command_freelist() or a new API. When a device is flagged bidi-able we could 1-Allocate bigger slots to have extra memory for SDBs together with the command itself. or 2-allocate yet another slab for SDBs. The 3rd approach is clean, and I can submit a patch for it. But I think it is not currently necessary. The first solution (See attached patch) we can all live with, I hope. Since for me it gives me the stability I need. And for the rest of the kernel it is the minimum change, layered on existing building blocks. If any one wants to try it out I put up the usual patchset that exercise this approach here. http://www.bhalevy.com/open-osd/download/double_rq_bidi Thanks Boaz >From ac8f0d3df711c5d01a269fde6a4ecce3000c3f68 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh <[EMAIL PROTECTED]> Date: Tue, 8 May 2007 14:04:46 +0300 Subject: [PATCH] scsi-ml bidi support At the block level bidi request uses req->next_rq pointer for a second bidi