Re: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
Never mind, I guess I was wrong, coz bio_add_page() is always called with "offset" = 0 here, so: offset == bv->bv_offset + bv->bv_len is never matched. Hence, if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; will trigger the if-break. So maybe this is after all just a dm-crypt issue on handling multiple bios created with next_bio (a bio chain)? On 10 August 2017 at 07:27, Tom Yan <tom.t...@gmail.com> wrote: > I haven't really tested but I was aware of the commit before I send my > last email. It doesn't seem relevant to be honest, because it doesn't > change the fact that the inner loop wil only end if the whole request > has been looped over. So still one big bio. > > There are a few things that seem suspicious to me. First of all, the > inner loop has an if-break at its end that seem to practically do > nothing, especially in terms of making the inner loop end when the bio > reach its expected size (BIO_MAX_PAGES?). > > bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); > > If you take a look at bio_add_page(): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/bio.c?h=v4.13-rc4#n820 > > it will basically always return "len", which is "sz" here, unchanged. > > if (bi_size < sz) > break; > > So this pretty much never happens, with two exceptions: > > if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) > return 0; > > which is most likely irrelevant in this case, > > if (bio->bi_vcnt >= bio->bi_max_vecs) > return 0; > > which seem to matter in this case. However: > > if (page == bv->bv_page && > offset == bv->bv_offset + bv->bv_len) { > bv->bv_len += len; > goto done; > } > ... > if (bio->bi_vcnt >= bio->bi_max_vecs) > return 0; > ... > bio->bi_vcnt++; > done: > bio->bi_iter.bi_size += len; > return len; > > So if the first condition in the above quote is matched, the exception > would never happen either. > > On 9 August 2017 at 21:38, h...@lst.de <h...@lst.de> wrote: >> Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 >> "block: Fix __blkdev_issue_zeroout loop" fix the issue for you?
Re: [BUG] BLKZEROOUT on dm-crypt container cause OOM / kernel panic
I haven't really tested but I was aware of the commit before I send my last email. It doesn't seem relevant to be honest, because it doesn't change the fact that the inner loop wil only end if the whole request has been looped over. So still one big bio. There are a few things that seem suspicious to me. First of all, the inner loop has an if-break at its end that seem to practically do nothing, especially in terms of making the inner loop end when the bio reach its expected size (BIO_MAX_PAGES?). bi_size = bio_add_page(bio, ZERO_PAGE(0), sz, 0); If you take a look at bio_add_page(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/bio.c?h=v4.13-rc4#n820 it will basically always return "len", which is "sz" here, unchanged. if (bi_size < sz) break; So this pretty much never happens, with two exceptions: if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) return 0; which is most likely irrelevant in this case, if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; which seem to matter in this case. However: if (page == bv->bv_page && offset == bv->bv_offset + bv->bv_len) { bv->bv_len += len; goto done; } ... if (bio->bi_vcnt >= bio->bi_max_vecs) return 0; ... bio->bi_vcnt++; done: bio->bi_iter.bi_size += len; return len; So if the first condition in the above quote is matched, the exception would never happen either. On 9 August 2017 at 21:38, h...@lst.dewrote: > Does commit 615d22a51c04856efe62af6e1d5b450aaf5cc2c0 > "block: Fix __blkdev_issue_zeroout loop" fix the issue for you?
Re: [PATCH v2 2/2] sd: check BLK_DEF_MAX_SECTORS against max_dev_sectors
On 14 August 2016 at 17:00, Tom Yan <tom.t...@gmail.com> wrote: > > That won't really work. min_t() would, though the line is gonna be a > bit long; not sure if I can/should simply cast the type (unsigned int) > to BLK_DEF_MAX_SECTORS. And which braces are you referring to? > Oh you mean the else-clause braces. Hmm I thought the coding style was that if you needed braces in one clause, you should use braces for others in the condition even if it consists of only one line of statement. At least that is what I was told when I send patches for libata. So, should I use min_t() or just do a type casting in front to BLK_DEF_MAX_SECTORS? Also do I need to break the line at a certain point when it exceeds certain length? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large
On 13 August 2016 at 04:56, Martin K. Petersen <martin.peter...@oracle.com> wrote: >>>>>> "Tom" == Tom Yan <tom.t...@gmail.com> writes: > > Tom, > >>> put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM / (sector_size / 512), >>> [36]); > > How many 8-byte ranges fit in a 4096-byte sector? The thing is, as of ACS-4, blocks that carry DSM/TRIM LBA Range Entries are always 512-byte. Honestly, I have no idea how that would work on a 4Kn SSD, if it is / will ever be a thing. Anyway I was NOT trying to bump the number entries allowed, but instead to decrease it accordingly for the increased sector size, only to make sure that the Maximum Write Same Length advertised here is gonna overflow the 32-bit limit when it is converted to a representation in bytes (i.e. discard_max_bytes and write_same_max_bytes), and probably also, the bio size. It's also a suggestion from me for Shaun to determine how the Maximum Write Same Length to be advertised, so that the size in bytes covered by a single SCT Write Same will stay at ~2G in spite of the logical sector size. > > Tom> So were you trying to pointing out something I am still missing, or > Tom> were you merely confirming I was right? > > I suggest you drop ATA_MAX_TRIM_RNUM and do: > > enum { > ATA_TRIM_BLOCKS_PER_RANGE = 65535, /* 0x blocks per range desc. */ > ATA_TRIM_RANGE_SIZE_SHIFT = 3, /* range descriptor is 8 bytes */ > }; > > put_unaligned_be64(ATA_TRIM_BLOCKS_PER_RANGE * >sector_size >> ATA_TRIM_RANGE_SIZE_SHIFT, [36]); > > Might be worthwhile to create an ata_max_lba_range_blocks() wrapper. > > -- > Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] sd: check BLK_DEF_MAX_SECTORS against max_dev_sectors
On 13 August 2016 at 05:37, Martin K. Petersenwrote: > > It would be pretty unusual for a device that is smart enough to report a > transfer length limit to be constrained to 1 MB and change. > Well, it is done pretty much for libata's SATL. Also since opt_xfer_blocks is checked against dev_max, I think it makes sense in logic that we do the equivalence for BLK_DEF_MAX_SECTORS. Not to mention that since we check rw_max against the controller limit, no reason not to make sure we check it against the device limit in both cases. > > If you want to fix this please drop the braces and do: > > rw_max = min(BLK_DEF_MAX_SECTORS, q->limits.max_dev_sectors); That won't really work. min_t() would, though the line is gonna be a bit long; not sure if I can/should simply cast the type (unsigned int) to BLK_DEF_MAX_SECTORS. And which braces are you referring to? > > -- > Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] sd: dynamically adjust SD_MAX_WS16_BLOCKS as per the actual logical block size
On 13 August 2016 at 04:26, Martin K. Petersen <martin.peter...@oracle.com> wrote: >>>>>> "Tom" == Tom Yan <tom.t...@gmail.com> writes: > > Tom, > > Tom> I don't really follow. What would this BLK_MAX_BIO_SECTORS be? It > Tom> doesn't appear to me that a static value is going to address the > Tom> problem I am addressing in this patch. > > 0x7f, the maximum number of block layer sectors that can be > expressed in a single bio. Hmm, so when we queue any of the limits, we convert a certain maximum number of physical sectors (which we has already been doing) to its corresponding maximum number of block layer sectors, and then make sure that number does not exceed 0x7f, right? > > -- > Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] libata-scsi: use dev->max_sectors from libata-core appropriately
On 12 August 2016 at 19:56,wrote: > > Also note that ATA_HORKAGE_MAX_SEC_LBA48 is not supposed to work > automatically anyway, even when max_hw_sectors is as high as 65535, > since the effective max_sectors will be set by the SCSI disk driver. > I missed the fact that ATA_HORKAGE_MAX_SEC_LBA48 is only (and should only be) used by a few ATAPI devices, and the SCSI cdrom devices will not touch max_sectors like the SCSI disk driver. Therefore, I guess I should really just use blk_queue_max_hw_sectors() to set both max_hw_sectors and max_sectors for ATAPI devices. Sending a v2. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices
On 12 August 2016 at 13:18, Tom Yan <tom.t...@gmail.com> wrote: > On 12 August 2016 at 10:01, Martin K. Petersen > <martin.peter...@oracle.com> wrote: >> >> Again, the point of max_hw_sectors and max_dev_sectors is to enforce the >> hard limits of controller and device respectively. Nothing else. >> > > Sounds like libata-scsi is doing something wrong then. It should not > set max_hw_sectors to dev->max_sectors that is set by libata-core: > >> blk_queue_max_hw_sectors(q, dev->max_sectors); Though we'll still have to "abuse" max_hw_sectors for ATAPI class devices, since neither the SATL or sr cares about VPD. The only devices that need ATA_HORKAGE_MAX_SEC_LBA48 are ATAPI devices as well. > > but instead it should report it as Maximum Transfer Length and let sd > set it as max_dev_sectors: > >> put_unaligned_be32(dev->max_sectors, [8]); > > While max_hw_sectors will be left untouched (in the case of AHCI, for > example, since its SCSI host template does not have max_sectors set; > so max_hw_sectors will be SCSI_DEFAULT_MAX_SECTORS). > > Although that means setting dev->max_sectors to a value larger than > 1024 will probably be a no-op, if that's really an issue, we should > have the host templates in libata updated. > > Make sense? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices
On 12 August 2016 at 10:01, Martin K. Petersenwrote: > > Again, the point of max_hw_sectors and max_dev_sectors is to enforce the > hard limits of controller and device respectively. Nothing else. > Sounds like libata-scsi is doing something wrong then. It should not set max_hw_sectors to dev->max_sectors that is set by libata-core: > blk_queue_max_hw_sectors(q, dev->max_sectors); but instead it should report it as Maximum Transfer Length and let sd set it as max_dev_sectors: > put_unaligned_be32(dev->max_sectors, [8]); While max_hw_sectors will be left untouched (in the case of AHCI, for example, since its SCSI host template does not have max_sectors set; so max_hw_sectors will be SCSI_DEFAULT_MAX_SECTORS). Although that means setting dev->max_sectors to a value larger than 1024 will probably be a no-op, if that's really an issue, we should have the host templates in libata updated. Make sense? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large
On 12 August 2016 at 09:34, Martin K. Petersenwrote: >> "Tom," == tom ty89 writes: > > Tom, > > + put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM / > + (sector_size / 512), [36]); > > MAXIMUM WRITE SAME LENGTH is in units of the device's logical block > size. Yeah that's why: + sector_size = ata_id_logical_sector_size(args->id); Where the logical sector size of the device is reported with the same function in ata_scsi_dev_config() and ata_scsiop_read_cap(). So were you trying to pointing out something I am still missing, or were you merely confirming I was right? > > -- > Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] libata-scsi: make sure Maximum Write Same Length is not too large
The patch isn't about how the request from the block layer will be processed (to form the SCSI commands). What it addresses is blk_queue_max_write_same_sectors() and blk_queue_max_discard_sectors() that are called in the SCSI disk driver. You can see that they are called with an input of the Maximum Write Same Length times (logical_block_size >> 9), which is to convert the number of sectors to an appropriate value for the block layer (512-byte block based). On 4Kn drives, the multiplier will be 4096 >> 9, which is 8. So if the reported Maximum Write Same Length is 4194240, when the value is passed onto the block layer to set the limits for a 4Kn drive, it will be 4194240 * 8. (And when this value is represented in bytes, it will be further multiplied by 512 and over 32-bit.) `logical_block_size >> 9` is pretty much the same thing as ``logical_block_size / 512`. I should have probably used the bit shift way instead. On 11 August 2016 at 17:04, Shaun Tancheff <sh...@tancheff.com> wrote: > On Thu, Aug 11, 2016 at 3:26 AM, <tom.t...@gmail.com> wrote: >> From: Tom Yan <tom.t...@gmail.com> >> >> Currently we advertise Maximum Write Same Length based on the >> maximum number of sectors that one-block TRIM payload can cover. >> The field are used to derived discard_max_bytes and >> write_same_max_bytes limits in the block layer, which currently can >> at max be 0x (32-bit). >> >> However, with a AF 4Kn drive, the derived limits would be 65535 * >> 64 * 4096 = 0x3fffc (34-bit). Therefore, we now devide >> ATA_MAX_TRIM_RNUM with (logical sector size / 512), so that the >> derived limits will not overflow. >> >> The limits are now also consistent among drives with different >> logical sector sizes. (Although that may or may not be what we >> want ultimately when the SCSI / block layer allows larger >> representation in the future.) >> >> Although 4Kn ATA SSDs may not be a thing on the market yet, this >> patch is necessary for forthcoming SCT Write Same translation >> support, which could be available on traditional HDDs where 4Kn is >> already a thing. Also it should not change the current behavior on >> drives with 512-byte logical sectors. >> >> Note: this patch is not about AF 512e drives. >> Signed-off-by: Tom Yan <tom.t...@gmail.com> >> >> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c >> index be9c76c..dcadcaf 100644 >> --- a/drivers/ata/libata-scsi.c >> +++ b/drivers/ata/libata-scsi.c >> @@ -2295,6 +2295,7 @@ static unsigned int ata_scsiop_inq_89(struct >> ata_scsi_args *args, u8 *rbuf) >> static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) >> { >> u16 min_io_sectors; >> + u32 sector_size; >> >> rbuf[1] = 0xb0; >> rbuf[3] = 0x3c; /* required VPD size with unmap support */ >> @@ -2309,17 +2310,27 @@ static unsigned int ata_scsiop_inq_b0(struct >> ata_scsi_args *args, u8 *rbuf) >> min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id); >> put_unaligned_be16(min_io_sectors, [6]); >> >> - /* >> -* Optimal unmap granularity. >> -* >> -* The ATA spec doesn't even know about a granularity or alignment >> -* for the TRIM command. We can leave away most of the unmap related >> -* VPD page entries, but we have specifify a granularity to signal >> -* that we support some form of unmap - in thise case via WRITE SAME >> -* with the unmap bit set. >> -*/ >> + sector_size = ata_id_logical_sector_size(args->id); >> if (ata_id_has_trim(args->id)) { >> - put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM, [36]); >> + /* >> +* Maximum write same length. >> +* >> +* Avoid overflow in discard_max_bytes and >> write_same_max_bytes >> +* with AF 4Kn drives. Also make them consistent among drives >> +* with different logical sector sizes. >> +*/ >> + put_unaligned_be64(65535 * ATA_MAX_TRIM_RNUM / >> + (sector_size / 512), [36]); > > I think the existing fixups in sd_setup_discard_cmnd() and > sd_setup_write_same_cmnd() > are 'doing the right thing'. > > If I understand the stack correctly: > > libata-scsi.c (and sd.c) both report a maximum in terms of 512 byte sectors. > The upper layer stack works (mostly) on a mix of bytes and 512 byte sectors > agnostic of the underlying hard
Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices
On 11 August 2016 at 11:37, Martin K. Petersen <martin.peter...@oracle.com> wrote: >>>>>> "Tom" == Tom Yan <tom.t...@gmail.com> writes: > > I don't agree with conflating the optimal transfer size and the maximum > supported ditto. Submitting the largest possible I/O to a device does > not guarantee that you get the best overall performance. > > - max_hw_sectors is gated by controller DMA constraints. > > - max_dev_sectors is set for devices that explicitly report a transfer >length limit. > > - max_sectors, the soft limit for filesystem read/write requests, >should be left at BLK_DEF_MAX_SECTORS unless the device explicitly >requests transfers to be aligned multiples of a different value >(typically the internal stripe size in large arrays). Shouldn't we use Maximum Transfer Length to derive max_sectors (and get rid of the almost useless max_dev_sectors)? Honestly it looks pretty non-sensical to me that the SCSI disk driver uses Optimal Transfer Length for max_sectors. Also in libata's case, this make setting the effective max_sectors (e.g. see ATA_HORKAGE_MAX_SEC_LBA48) impossible if we do not want to touch io_opt. It would look to me that our block layer simply have a flawed design if we really need to derive both io_opt and max_sectors from the same field. > > The point of BLK_DEF_MAX_SECTORS is to offer a reasonable default for > common workloads unless otherwise instructed by the storage device. > > We can have a discussion about what the right value for > BLK_DEF_MAX_SECTORS should be. It has gone up over time but it used to > be the case that permitting large transfers significantly impacted > interactive I/O performance. And finding a sweet spot that works for a > wide variety of hardware, interconnects and workloads is obviously > non-trivial. > If BLK_DEF_MAX_SECTORS is supposed to be used as a fallback, then it should be a safe value, especially when max_sectors_kb can be adjusted through sysfs. But the biggest problem isn't on bumping it, but the value picked is totally irrational for a general default. I mean, given that it was 1024 (512k), try to double it? Fine. Try to quadruple it? Alright. We'll need to deal with some alignment / boundary issue (like the typical 65535 vs 65536 case)? Okay let's do it. But what's the sense in picking a random RAID configuartion as the base to decide the default? Also, if max_sectors need to concern about the number of disks used and chunk sizes in a RAID configuartion, it should be calculated in the device-mapper layer or a specific driver or so. Changing a block layer default won't help anyway. Say 2560 will accomodate a 10-disk 128k-chunk RAID. What about a 12-disk 128k-chunk RAID then? Why not just decide the value base on an 8-disk 128k-chunk RAID, which HAPPENED to be a double of 1024 as well? It does not make sense that the SCSI disk driver uses it as the fallback either. SCSI host templates that does not have max_sectors set (as well as some specific driver) will use SCSI_DEFAULT_MAX_SECTORS as the fallback, for such hosts max_hw_sectors will be 1024, where the current BLK_DEF_MAX_SECTORS cannot apply as max_sectors anyway. So we should use also SCSI_DEFAULT_MAX_SECTORS in the SCSI disk driver as fallback for max_sectors. If the value is considered to low even as a safe fallback, then it should be bumped appropriately. (Or we might want to replace it with BLK_DEF_MAX_SECTORS everywhere in the SCSI layer, that said, after the value is fixed.) > -- > Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
On 10 August 2016 at 15:41, David Milburnwrote: > Hi, > > The 168 makes AHCI_CMD_TBL_SZ equal to 2816 > > AHCI_CMD_TBL_SZ = AHCI_CMD_TBL_HDR_SZ + (AHCI_MAX_SG * 16) > AHCI_CMD_TBL_SZ = 128 + (168 * 16) > > I think if you add in AHCI_CMD_SLOT_SZ (1024) and AHCI_RX_FIS_SZ (256) > the DMA is 4K aligned, I think that is where the 168 came from. Looks like the right guess. Though AHCI_PORT_PRIV_DMA_SZ is not: AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_SZ (2816) + AHCI_RX_FIS_SZ (256) = 4096 but: AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) + AHCI_RX_FIS_SZ (256) = 91392 and AHCI_PORT_PRIV_FBS_DMA_SZ is: AHCI_CMD_SLOT_SZ (1024) + AHCI_CMD_TBL_AR_SZ (2816 * 32 = 90112) + AHCI_RX_FIS_SZ * 16 (4096) = 95232 > > Thanks, > David > > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
On 10 August 2016 at 11:26, Tejun Heowrote: > Hmmm.. why not? The hardware limit is 64k and the driver is using a Is that referring to the maximum number of entries allowed in the PRDT, Physical Region Descriptor Table (which is, more precisely, 65535)? > lower limit of 168 most likely because it doesn't make noticeable > difference beyond certain point and it determines the size of > contiguous memory which has to be allocated for the command table. > Each sg entry is 16 bytes. Pushing it to the hardware limit would > require an order 9 allocation for each port. That makes sense to me, and I didn't have the intention to push it to the limit anyway. > Not necessarily. A single sg entry can point to an area larger than > PAGE_SIZE. You mean the 4MB limit of "Data Byte Count" in "DW3: Description Information" of the PRDT? Is that what max_segment_size (which is set to a general fallback of 65536: http://lxr.free-electrons.com/ident?i=dma_get_max_seg_size) is about in this case? And my point was, it will be a multiple of 168 anyway, if 1344 is just an example. > As written above, that probably makes the ahci command table size > nicely aligned. I think that's what bothers me ultimately, cause I don't see how 168 makes it (more) nicely aligned (or even, aligned to what?). I even checked out the AHCI driver of FreeBSD (https://github.com/freebsd/freebsd/blob/master/sys/dev/ahci/ahci.h): ... #define MAXPHYS 512 * 1024 ... #define AHCI_SG_ENTRIES (roundup(btoc(MAXPHYS) + 1, 8)) ... #define AHCI_CT_SIZE (128 + AHCI_SG_ENTRIES * 16) ... Couldn't get the sense out of the `+ 1` and round up to 8 thing either. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] libata-scsi: do not call blk_queue_max_hw_sectors()
Withdrawn. blk_queue_max_hw_sectors() is not called in hosts.c but in scsi_lib.c. However, it does not check the dev->max_sectors set in libata-core.c. So everything of this patch is wrong. Will rewrite and resend the second patch. On 9 August 2016 at 18:31, <tom.t...@gmail.com> wrote: > From: Tom Yan <tom.t...@gmail.com> > > We should just let the scsi driver (hosts.c) call the function. It > has better heuristic anyway (i.e. use SCSI_DEFAULT_MAX_SECTORS as > fallback when max_sectors is not set). > > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 2bdb5da..495d916 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1204,9 +1204,6 @@ static int ata_scsi_dev_config(struct scsi_device *sdev, > if (!ata_id_has_unload(dev->id)) > dev->flags |= ATA_DFLAG_NO_UNLOAD; > > - /* configure max sectors */ > - blk_queue_max_hw_sectors(q, dev->max_sectors); > - > if (dev->class == ATA_DEV_ATAPI) { > void *buf; > > -- > 2.9.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)
So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560 (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024. However, I am suspecting that the bump of BLK_DEF_MAX_SECTORS is not the "real" cause of the trouble, but the fact that AHCI_MAX_SG has been set to a weird value of 168 (with a comment "hardware max is 64K", which neither seem to make any sense). AHCI_MAX_SG is used to set the sg_tablesize (i.e. max_segments, apparently), which is apparently used to derive the actual "request size" (that is, if it is lower than max_sectors(_kb), it will be the limiting factor instead). For example, no matter if the drive has max_sectors set to 2560, or to 65535 (by adding it as the Optimal Transfer Length to libata's SATL, which is also max_hw_sectors that is set from ATA_MAX_SECTORS_LBA48), "avgrq-sz" in `iostat` will be capped at 1344 (168 * 8). However, if I change AHCI_MAX_SG to 128 (which is also the sg_tablesize set in libata.h from LIBATA_MAX_PRD), "avgrq-sz" in `iostat` will be capped at 1024 (128 * 8), which should make ATA_HORKAGE_MAX_SEC_1024 unnecessary. So why has AHCI_MAX_SG been set to 168 anyway? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html