re: [SCSI] fnic: FIP VLAN Discovery Feature Support
Hello Hiral Patel, The patch d3c995f1dcf9: "[SCSI] fnic: FIP VLAN Discovery Feature Support" from Feb 25, 2013, leads to the following static checker warning: drivers/scsi/fnic/fnic_fcs.c:278 is_fnic_fip_flogi_reject() warn: is it ok to set 'els_len' to negative? This is some unpushable debug code on my Smatch system. It is sort of a nonsense warning because Smatch has run into an "impossible" condition because of a bug in Smatch but also because of bug in fnic_fcoe_send_vlan_req(). drivers/scsi/fnic/fnic_fcs.c 251 fiph = (struct fip_header *)skb->data; ^ fiph is untrusted data. We don't trust the network. This is a rule of Linux. 252 op = ntohs(fiph->fip_op); 253 sub = fiph->fip_subcode; 254 255 if (op != FIP_OP_LS) 256 return 0; 257 258 if (sub != FIP_SC_REP) 259 return 0; 260 261 rlen = ntohs(fiph->fip_dl_len) * 4; 262 if (rlen + sizeof(*fiph) > skb->len) 263 return 0; 264 265 desc = (struct fip_desc *)(fiph + 1); 266 dlen = desc->fip_dlen * FIP_BPW; desc->fip_dlen is u8 type so it could be a number between 0-255. That's fine. FIP_BPW is 4. Here is the bug in Smatch. It looks at the sender code and says that desc->fip_dlen is set to either 2 or 4. So now dlen is in 8-16 range. 267 268 if (desc->fip_dtype == FIP_DT_FLOGI) { 269 270 shost_printk(KERN_DEBUG, lport->host, 271" FIP TYPE FLOGI: fab name:%llx " 272"vfid:%d map:%x\n", 273fip->sel_fcf->fabric_name, fip->sel_fcf->vfid, 274fip->sel_fcf->fc_map); 275 if (dlen < sizeof(*els) + sizeof(*fh) + 1) ^ 8-16 is always less than 29 so this condition is always true. 276 return 0; 277 278 els_len = dlen - sizeof(*els); ^ We are in an impossible condition and Smatch doesn't know the possible values of "dlen" any more so it generates the warning. 279 els = (struct fip_encaps *)desc; 280 fh = (struct fc_frame_header *)(els + 1); 281 els_dtype = desc->fip_dtype; 282 283 if (!fh) 284 return 0; 285 286 /* 287 * ELS command code, reason and explanation should be = Reject, 288 * unsupported command and insufficient resource 289 */ 290 els_op = *(u8 *)(fh + 1); 291 if (els_op == ELS_LS_RJT) { 292 shost_printk(KERN_INFO, lport->host, 293"Flogi Request Rejected by Switch\n"); 294 return 1; 295 } 296 shost_printk(KERN_INFO, lport->host, 297 "Flogi Request Accepted by Switch\n"); 298 } 299 return 0; 300 } 301 302 static void fnic_fcoe_send_vlan_req(struct fnic *fnic) 303 { 304 struct fcoe_ctlr *fip = &fnic->ctlr; 305 struct fnic_stats *fnic_stats = &fnic->fnic_stats; 306 struct sk_buff *skb; 307 char *eth_fr; 308 int fr_len; 309 struct fip_vlan *vlan; 310 u64 vlan_tov; 311 312 fnic_fcoe_reset_vlans(fnic); 313 fnic->set_vlan(fnic, 0); 314 FNIC_FCS_DBG(KERN_INFO, fnic->lport->host, 315"Sending VLAN request...\n"); 316 skb = dev_alloc_skb(sizeof(struct fip_vlan)); 317 if (!skb) 318 return; 319 320 fr_len = sizeof(*vlan); 321 eth_fr = (char *)skb->data; 322 vlan = (struct fip_vlan *)eth_fr; 323 324 memset(vlan, 0, sizeof(*vlan)); 325 memcpy(vlan->eth.h_source, fip->ctl_src_addr, ETH_ALEN); 326 memcpy(vlan->eth.h_dest, fcoe_all_fcfs, ETH_ALEN); 327 vlan->eth.h_proto = htons(ETH_P_FIP); 328 329 vlan->fip.fip_ver = FIP_VER_ENCAPS(FIP_VER); 330 vlan->fip.fip_op = htons(FIP_OP_VLAN); 331 vlan->fip.fip_subcode = FIP_SC_VL_REQ; 332 vlan->fip.fip_dl_len = htons(sizeof(vlan->desc) / FIP_BPW); 333 334 vlan->desc.mac.fd_desc.fip_dtype = FIP_DT_MAC; 335 vlan->desc.mac.fd_desc.fip_dlen = sizeof(vlan->desc.mac) / FIP_BPW; 8 / 4 = 2. We are setting ->fib_dlen to 2. 336 memcpy(&vlan->desc.mac.fd_mac, fip->ctl_
RE: [patch] [SCSI] megaraid: missing bounds check in mimd_to_kioc()
>-Original Message- >From: Dan Carpenter [mailto:dan.carpen...@oracle.com] >Sent: Wednesday, October 30, 2013 10:44 PM >To: DL-MegaRAID Linux >Cc: James E.J. Bottomley; linux-scsi@vger.kernel.org; secur...@kernel.org; >Nico Golde; Fabian Yamaguchi >Subject: [patch] [SCSI] megaraid: missing bounds check in mimd_to_kioc() > >pthru32->dataxferlen comes from the user so we need to check that it's >not too large so we don't overflow the buffer. > >Reported-by: Nico Golde >Reported-by: Fabian Yamaguchi >Signed-off-by: Dan Carpenter >--- >Please review this carefully because I have not tested it. > >diff --git a/drivers/scsi/megaraid/megaraid_mm.c >b/drivers/scsi/megaraid/megaraid_mm.c >index dfffd0f..a706927 100644 >--- a/drivers/scsi/megaraid/megaraid_mm.c >+++ b/drivers/scsi/megaraid/megaraid_mm.c >@@ -486,6 +486,8 @@ mimd_to_kioc(mimd_t __user *umimd, >mraid_mmadp_t *adp, uioc_t *kioc) > > pthru32->dataxferaddr = kioc->buf_paddr; > if (kioc->data_dir & UIOC_WR) { >+ if (pthru32->dataxferlen > kioc->xferlen) >+ return -EINVAL; > if (copy_from_user(kioc->buf_vaddr, kioc->user_data, > pthru32->dataxferlen)) { > return (-EFAULT); Acked-by: Sumit Saxena Sumit -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] megaraid_sas: make the log message about rescanning more informative.
>-Original Message- >From: Jasper Spaans [mailto:spa...@fox-it.com] >Sent: Friday, October 25, 2013 5:53 PM >To: DL-MegaRAID Linux >Cc: linux-scsi@vger.kernel.org >Subject: [PATCH] megaraid_sas: make the log message about rescanning >more informative. > >Hi, > >I was working on one of our servers, and saw the following interesting >message in dmesg: > >[20961024.720972] scanning ... > >This scared me somewhat. Please find a patch to make it easier to see where >this message is coming from. > >Cheers, >Jasper > >--- > drivers/scsi/megaraid/megaraid_sas_base.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >b/drivers/scsi/megaraid/megaraid_sas_base.c >index 1f0ca68..3457289 100644 >--- a/drivers/scsi/megaraid/megaraid_sas_base.c >+++ b/drivers/scsi/megaraid/megaraid_sas_base.c >@@ -5461,7 +5461,7 @@ megasas_aen_polling(struct work_struct *work) > } > > if (doscan) { >- printk(KERN_INFO "scanning ...\n"); >+ printk(KERN_INFO "Rescanning MegaRAID devices ...\n"); > megasas_get_pd_list(instance); > for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) { > for (j = 0; j < MEGASAS_MAX_DEV_PER_CHANNEL; >j++) { >-- >1.7.9.5 Acked-by: Sumit Saxena Sumit > > >-- > /\/\ ir. Jasper Spaans // Lead Developer www.FoxDetACT.com > \ (_)/ Fox-IT - For a more secure society! > \XT: +31-15-2847999 > \ / \ M: +31-6-41588725 >\/ KvK Haaglanden 27301624 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] [SCSI] megaraid: use GFP_ATOMIC under spin lock
>-Original Message- >From: Wei Yongjun [mailto:weiyj...@gmail.com] >Sent: Friday, December 20, 2013 8:38 AM >To: DL-MegaRAID Linux; jbottom...@parallels.com; simon.pu...@gmail.com; >jkos...@suse.cz >Cc: yongjun_...@trendmicro.com.cn; linux-scsi@vger.kernel.org >Subject: [PATCH] [SCSI] megaraid: use GFP_ATOMIC under spin lock > >From: Wei Yongjun > >A spin lock is taken here so we should use GFP_ATOMIC. > >Signed-off-by: Wei Yongjun >--- > drivers/scsi/megaraid/megaraid_mm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/drivers/scsi/megaraid/megaraid_mm.c >b/drivers/scsi/megaraid/megaraid_mm.c >index a706927..99fa5d3 100644 >--- a/drivers/scsi/megaraid/megaraid_mm.c >+++ b/drivers/scsi/megaraid/megaraid_mm.c >@@ -570,7 +570,7 @@ mraid_mm_attach_buf(mraid_mmadp_t *adp, uioc_t >*kioc, int xferlen) > > kioc->pool_index= right_pool; > kioc->free_buf = 1; >- kioc->buf_vaddr = pci_pool_alloc(pool->handle, GFP_KERNEL, >+ kioc->buf_vaddr = pci_pool_alloc(pool->handle, >GFP_ATOMIC, > &kioc->buf_paddr); > spin_unlock_irqrestore(&pool->lock, flags); > > Acked-by: Sumit Saxena Sumit -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SCSI] fnic: FIP VLAN Discovery Feature Support
Hiral isn't at Cisco any more. regards, dan carpenter On Wed, Jan 08, 2014 at 12:57:41PM +0300, Dan Carpenter wrote: > Hello Hiral Patel, > > The patch d3c995f1dcf9: "[SCSI] fnic: FIP VLAN Discovery Feature > Support" from Feb 25, 2013, leads to the following > static checker warning: > > drivers/scsi/fnic/fnic_fcs.c:278 is_fnic_fip_flogi_reject() > warn: is it ok to set 'els_len' to negative? > > This is some unpushable debug code on my Smatch system. It is sort of > a nonsense warning because Smatch has run into an "impossible" condition > because of a bug in Smatch but also because of bug in > fnic_fcoe_send_vlan_req(). > > drivers/scsi/fnic/fnic_fcs.c >251 fiph = (struct fip_header *)skb->data; > ^ > fiph is untrusted data. We don't trust the network. This is a rule of > Linux. > >252 op = ntohs(fiph->fip_op); >253 sub = fiph->fip_subcode; >254 >255 if (op != FIP_OP_LS) >256 return 0; >257 >258 if (sub != FIP_SC_REP) >259 return 0; >260 >261 rlen = ntohs(fiph->fip_dl_len) * 4; >262 if (rlen + sizeof(*fiph) > skb->len) >263 return 0; >264 >265 desc = (struct fip_desc *)(fiph + 1); >266 dlen = desc->fip_dlen * FIP_BPW; > > desc->fip_dlen is u8 type so it could be a number between 0-255. That's > fine. > FIP_BPW is 4. > Here is the bug in Smatch. It looks at the sender code and says that > desc->fip_dlen is set to either 2 or 4. So now dlen is in 8-16 range. > >267 >268 if (desc->fip_dtype == FIP_DT_FLOGI) { >269 >270 shost_printk(KERN_DEBUG, lport->host, >271" FIP TYPE FLOGI: fab name:%llx " >272"vfid:%d map:%x\n", >273fip->sel_fcf->fabric_name, > fip->sel_fcf->vfid, >274fip->sel_fcf->fc_map); >275 if (dlen < sizeof(*els) + sizeof(*fh) + 1) > ^ > 8-16 is always less than 29 so this condition is always true. > >276 return 0; >277 >278 els_len = dlen - sizeof(*els); > ^ > We are in an impossible condition and Smatch doesn't know the possible > values of "dlen" any more so it generates the warning. > >279 els = (struct fip_encaps *)desc; >280 fh = (struct fc_frame_header *)(els + 1); >281 els_dtype = desc->fip_dtype; >282 >283 if (!fh) >284 return 0; >285 >286 /* >287 * ELS command code, reason and explanation should be > = Reject, >288 * unsupported command and insufficient resource >289 */ >290 els_op = *(u8 *)(fh + 1); >291 if (els_op == ELS_LS_RJT) { >292 shost_printk(KERN_INFO, lport->host, >293"Flogi Request Rejected by > Switch\n"); >294 return 1; >295 } >296 shost_printk(KERN_INFO, lport->host, >297 "Flogi Request Accepted by Switch\n"); >298 } >299 return 0; >300 } >301 >302 static void fnic_fcoe_send_vlan_req(struct fnic *fnic) >303 { >304 struct fcoe_ctlr *fip = &fnic->ctlr; >305 struct fnic_stats *fnic_stats = &fnic->fnic_stats; >306 struct sk_buff *skb; >307 char *eth_fr; >308 int fr_len; >309 struct fip_vlan *vlan; >310 u64 vlan_tov; >311 >312 fnic_fcoe_reset_vlans(fnic); >313 fnic->set_vlan(fnic, 0); >314 FNIC_FCS_DBG(KERN_INFO, fnic->lport->host, >315"Sending VLAN request...\n"); >316 skb = dev_alloc_skb(sizeof(struct fip_vlan)); >317 if (!skb) >318 return; >319 >320 fr_len = sizeof(*vlan); >321 eth_fr = (char *)skb->data; >322 vlan = (struct fip_vlan *)eth_fr; >323 >324 memset(vlan, 0, sizeof(*vlan)); >325 memcpy(vlan->eth.h_source, fip->ctl_src_addr, ETH_ALEN); >326 memcpy(vlan->eth.h_dest, fcoe_all_fcfs, ETH_ALEN); >327 vlan->eth.h_proto = htons(ETH_P_FIP); >328 >329 vlan->fip.fip_ver = FIP_VER_ENCAPS(FIP_VER); >330 vlan->fip.fip_op = htons(FIP_OP_VLAN); >331 vlan->fip.fip_subcode =
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On Wed, Jan 08, 2014 at 02:17:13AM +0100, Jan Kara wrote: > Well, I was specifically worried about i_mutex locking. In particular: > Before we report appending IO completion we need to update i_size. > To update i_size we need to grab i_mutex. > > Now this is unpleasant because inode_dio_wait() happens under i_mutex so > the above would create lock inversion. And we cannot really do > inode_dio_done() before grabbing i_mutex as that would open interesting > races between truncate decreasing i_size and DIO increasing it. Yeah, XFS splits this between the ilock and iolock, which just makes life in this area a whole lot easier. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On Tue, Jan 07, 2014 at 08:37:23PM +0200, Sergey Meirovich wrote: > Actually my initial report (14.67Mb/sec 3755.41 Requests/sec) was about ext4 > However I have tried XFS as well. It was a bit slower than ext4 on all > occasions. I wasn't trying to say XFS fixes your problem, but that we could implement appending AIO writes in XFS fairly easily. To verify Jan's theory, can you try to preallocate the file to the full size and then run the benchmark by doing a: # fallocate -l and then run it? If that's indeed the issue I'd be happy to implement the "real aio" append support for you as well. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
Hi James, On 7 January 2014 22:57, James Smart wrote: > Sergey, > > The Thor chipset is a bit old - a 4Gig adapter. Most of our performance > improvements, including parallelization, have gone into the 8G and 16G > adapters. But you still should have seen significantly beyond what you > reported. First of all - thanks a lot! I took Thor because we have exactly the same Thors in some of our Solaris servers. I've also tried 6 different qlogics (mostly 8G) and fnic (10G) as well. Surprisingly enough Thor was the fastest one for seqwr 4k. Though in most of the cases machines were from our different DCs and hence each one connected to yet another storage. > > We did a sanity check some hardware we already had set up with a Thor > adapter. We saw 23555 iop/s and 92.1 MB/s without needing to do much, well > beyond what you've reported, and still not up to what we know the card can > do. There are some inefficiencies from the linux kernel and some locking > deltas between our solaris and linux drivers - but not enough to account for > what you are seeing. > > I expect the Direct IO filesystem behavior is the root issue. The strangest thing to me that this is the problem with sequential write. For example the fnic one machine is zoned to EMC XtremIO and had results: 14.43Mb/sec 3693.65 Requests/sec for sequential 4k. The same fnic machine perfrormed rather impressive for random 4k 451.11Mb/sec 115485.02 Requests/sec -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers: target: target_core_mod: use div64_u64_rem() instead of operator '%' for u64
On 01/08/2014 03:32 PM, Hannes Reinecke wrote: > On 12/24/2013 04:35 AM, Chen Gang wrote: >> On 12/23/2013 02:51 PM, Nicholas A. Bellinger wrote: >>> On Sun, 2013-12-22 at 17:17 +0800, Chen Gang wrote: On 12/22/2013 10:56 AM, Nicholas A. Bellinger wrote: > Hi Chen, > > On Sat, 2013-12-21 at 10:08 +0800, Chen Gang wrote: >> In kernel, need use div64_u64_rem() instead of operator '%' for u64, or >> can not pass compiling (with allmodconfig under metag): >> >> MODPOST 2909 modules >> ERROR: "__umoddi3" [drivers/target/target_core_mod.ko] undefined! >> >> Also need u64 type cast for u32 variable multiply u32 variable, or will >> cause type overflow issue. >> >> >> Signed-off-by: Chen Gang >> --- >> drivers/target/target_core_alua.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> > > FYI, this unsigned long long division in core_alua_state_lba_dependent() > was fixed for 32-bit in linux-next >= 12192013 code. > OK, thanks. The related fix patch changed "start_lba = lba % ..." to "start_lba = lba / ...", and also assumed "segment_size * segment_mult" is still within u32 (can not cause type over flow). I guess the original author already knew about them, and intended to do like that (if not, please let me know, thanks). >>> >>> Sorry, your correct that the original code is using modulo division to >>> calculate start_lba. >>> >> >> Oh, that's all right, (in fact, don't need sorry), I am not quite >> familiar with the details, so need related member help check it. :-) >> >>> Hannes, can you please double check this below..? >>> >> >> Please help check when have time, thanks. >> > I would even convert segment_size and segment_mult to u64, > to ensure no overflows occur: > > diff --git a/drivers/target/target_core_alua.c > b/drivers/target/target_core_alua > .c > index 9b1856d..54b1e52 100644 > --- a/drivers/target/target_core_alua.c > +++ b/drivers/target/target_core_alua.c > @@ -477,8 +477,7 @@ static inline int core_alua_state_lba_dependent( > u8 *alua_ascq) > { > struct se_device *dev = cmd->se_dev; > - u32 segment_size, segment_mult, sectors; > - u64 lba; > + u64 segment_size, segment_mult, sectors, lba; > > /* Only need to check for cdb actually containing LBAs */ > if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB)) > > OK, thanks. > Other than that the sector_div() patch is correct. > So we really need use '/' instead of original '%'? Thanks -- Chen Gang Open, share and attitude like air, water and life which God blessed -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
Hi Christoph, On 8 January 2014 16:03, Christoph Hellwig wrote: > On Tue, Jan 07, 2014 at 08:37:23PM +0200, Sergey Meirovich wrote: >> Actually my initial report (14.67Mb/sec 3755.41 Requests/sec) was about ext4 >> However I have tried XFS as well. It was a bit slower than ext4 on all >> occasions. > > I wasn't trying to say XFS fixes your problem, but that we could > implement appending AIO writes in XFS fairly easily. > > To verify Jan's theory, can you try to preallocate the file to the full > size and then run the benchmark by doing a: > > # fallocate -l > > and then run it? If that's indeed the issue I'd be happy to implement > the "real aio" append support for you as well. > After fallocate: [root@illin01 ext4]# du -k test_file.* | awk '{print $1}' |sort |uniq 81920 [root@illin01 ext4]# fallocate -l 81920k test_file.* Results are almost the same: 14.68Mb/sec 3758.02 Requests/sec -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of block-integrity
> "Hannes" == Hannes Reinecke writes: Hannes, Hannes> As there is no user (apart from oracleasm) no-one can attach Hannes> protection information to any data, so even the most dedicated Hannes> admin cannot exercise this path, let alone find issues here. That's not how it works! If the filesystem has not attached protection information to a bio the block layer will do it for you. The block layer generates protection information for writes and verifies it for reads. That's how it's worked since day one. The code is there, it is used by everyone with a DIX-capable HBA. See Documentation/block/data-integrity.txt. Normal applications do not want to have to deal with generating protection information, using an async I/O model, keeping completion state around for extended periods of time to figure out whether the I/O actually completed or not and so on. So the kernel-to-platter protection scheme we have in place now is good enough. That doesn't mean that I'm not interested in augmenting libaio. I am. Very. And I know of several applications that are keen to use it. But getting page cache passthrough and filesystem interaction working is non-trivial. That's what has inhibited progress, not extending the libaio API. Hannes> Doug Gilbert and I are currently discussing LID4 / ROD Token Hannes> copy for sg3_utils and the block layer, so any patches would be Hannes> very helpful here. I'm only doing LID1 right now. Any particular reason you are exploring LID4 and ROD? I resumed my efforts before Christmas but I keep running into issues. I'm guessing I'm a week or two from having something that is suitable for consumption. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On Wed, Jan 08, 2014 at 04:43:07PM +0200, Sergey Meirovich wrote: > Results are almost the same: > 14.68Mb/sec 3758.02 Requests/sec > On my laptop SSD I get the following results (sometimes up to 200MB/s, sometimes down to 100MB/s, always in the 40k to 50k IOps range): time elapsed (sec.):5 bandwidth (MiB/s): 160.00 IOps: 40960.00 The IOps are more than the hardware is physically capable of, but given that you didn't specify O_SYNC this seems sensible given that we never have to flush the disk cache. Could it be that your array has WCE=0? In Linux we'll never enable the cache automatically, but Solaris does at least when using ZFS. Try running: sdparm --set=WCE /dev/sdX and try again. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of block-integrity
> "James" == James Bottomley writes: James> No, I think you're confusing algorithms with protocols. DIF and James> DIX are two names for protection envelopes. DIF verifies James> integrity from the HBA to the device surface. DIX verifies James> integrity from an application to the HBA. Actually, DIX is a data integrity-aware HBA programming interface. We have an implementation of that interface in the SCSI layer and in some of the initiator drivers (lpfc, qla2xxx, mptNsas). There is no single name for stuff above DIX. Other than "block layer data integrity goo", "page cache black magic" and "let's add a few fields to struct iocb". James> So, the question is do we need to bother with DIX at all? No James> filesystem uses it ...explicitly. Every filesystem uses it implicitly. There are only two reasons for filesystems to want to be explicitly "block layer data integrity goo"-aware: 1. To be able to use the application tag space for back pointers or other metadata without requiring disk format changes. 2. To facilitate passthrough of protection information submitted via the $TBD application programming interface. I was hoping the extN folks would be interested in (1) but there were no takers. (2) is hard but not forgotten. In any case the status quo is that there is no point in filesystems manually generating protection information when the block layer is going to do it for them when the bio is submitted. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Wed, 8 Jan 2014, James Bottomley wrote: > OK, Agreed, but that means modifying the 1/2 patch with the below. This > should make the proposed diff to 2/2 correct. > > James > > --- > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index ef3f958..5fad646 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct > device *parent, > + shost->transportt->target_size; > struct scsi_target *starget; > struct scsi_target *found_target; > - int error; > + int error, ref_got; > > starget = kzalloc(size, GFP_KERNEL); > if (!starget) { > @@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct > device *parent, > return starget; > > found: > - if (!kref_get_unless_zero(&found_target->reap_ref)) > - /* > - * release routine already fired. Target is dead, but > - * STARGET_DEL may not yet be set (set in the release > - * routine), so set here as well, just in case > - */ > - found_target->state = STARGET_DEL; > + /* > + * release routine already fired if kref is zero, so if we can still > + * take the reference, the target must be alive. If we can't, it must > + * be dying and we need to wait for a new target > + */ > + ref_got = kref_get_unless_zero(&found_target->reap_ref); > + > spin_unlock_irqrestore(shost->host_lock, flags); > - if (found_target->state != STARGET_DEL) { > + if (ref_got) { > put_device(dev); > return found_target; > } > @@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct > device *parent, >* Unfortunately, we found a dying target; need to wait until it's >* dead before we can get a new one. There is an anomaly here. We >* *should* call scsi_target_reap() to balance the kref_get() of the > - * reap_ref above. However, since the target is in state STARGET_DEL, > - * it's already invisible and the reap_ref is irrelevant. If we call > + * reap_ref above. However, since the target being released, it's > + * already invisible and the reap_ref is irrelevant. If we call >* scsi_target_reap() we might spuriously do another device_del() on >* an already invisible target. >*/ In fact, most of this comment (everything after the first sentence) is no longer needed. If the target is dying then kref_get_unless_zero must have failed, so there is no need to worry about unbalanced refcounts. Otherwise this looks good. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
> From: Sarah Sharp > On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote: > > On 01/07/2014 01:21 PM, Sarah Sharp wrote: > > > > > Can you please try the attached patch, on top of the previous three > > > patches, and send me dmesg? > > > > Hi Sarah, I just now finished running 0001-More-debugging.patch for the > > first time. The previous dmesg didn't include that patch, but this one > > does. > > > > I read through this dmesg but I nodded off somewhere around line 500. > > I hope you can stay awake :) > > Well, it has all the info I need, but the results don't make me too > happy. Everything I've checked seems consistent, and I don't know why > the host stopped. The link TRBs are intact, the dequeue pointer for the > endpoint was pointing to the transfer that timed out and it had the > cycle bit set correctly, etc. Perhaps the no-op TRBs are really the > issue. > > I'll have to take a look at the log again tomorrow. I posted the dmesg > on pastebin if David wants to check it out as well: > http://pastebin.com/a4AUpsL1 I can't see anything obvious either. However there is no response to the 'stop endpoint' command. Section 4.6.9 (page 107 of rev1.0) states that the controller will complete any USB IN or OUT transaction before raising the command completion event. Possibly it is too 'stuck' to complete the transaction? The endpoint status is also still '1' (running). This also means that the 'TR dequeue pointer' is undefined - so the controller could easily be processing a later TRB. This field might even still contain the ring base address written by the driver much earlier. This might mean that something 'catastrophic' has happened earlier. Maybe the controller isn't actually seeing any doorbell writes at all. Maybe the base addresses it has for the rings have all got corrupted. At least this looks like amd64 - so there aren't memory coherency issues. Some hacks that might help isolate the problem: 1) Request an interrupt from the last nop data TRB. 2) Put a command nop (decimal 23) TRB into the command ring before the 'stop endpoint'. 3) Comment out the code that adds the nop data TRBs. The first two might need code adding to handle the responses. Do we know the actual xhci device? I think it reports version 0x96. (Sarah - it might be useful if that version were in one of the trace messages that is output by default.) David -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On Tue, 7 Jan 2014, Sarah Sharp wrote: > On Tue, Jan 07, 2014 at 03:57:00PM -0800, walt wrote: > > On 01/07/2014 01:21 PM, Sarah Sharp wrote: > > > > > Can you please try the attached patch, on top of the previous three > > > patches, and send me dmesg? > > > > Hi Sarah, I just now finished running 0001-More-debugging.patch for the > > first time. The previous dmesg didn't include that patch, but this one > > does. > > > > I read through this dmesg but I nodded off somewhere around line 500. > > I hope you can stay awake :) > > Well, it has all the info I need, but the results don't make me too > happy. Everything I've checked seems consistent, and I don't know why > the host stopped. The link TRBs are intact, the dequeue pointer for the > endpoint was pointing to the transfer that timed out and it had the > cycle bit set correctly, etc. Perhaps the no-op TRBs are really the > issue. This may be a foolish question, but why is xhci-hcd using no-op TRBs in the first place? It makes sense that they would be needed if you have to unlink an URB that isn't the first one on the endpoint ring. But usb-storage never does that; whenever it unlinks an URB, it always unlinks the earliest entry in the endpoint's queue. After unlinking the first URB on the ring, you don't need to fill in its TRBs with no-ops. Instead, when you are ready to start the ring againk, just tell the host controller to move the dequeue pointer up to the start of the next surviving URB. (You'll also have to adjust the CYCLE bits of the TRBs that get skipped over, but that's trivial.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
> From: Alan Stern > > This may be a foolish question, but why is xhci-hcd using no-op TRBs in > the first place? Because it can't write in a link TRB because other parts of the code use link TRBs to detect the end of the ring. The problem is that it can't put a link TRB in the middle of a chain of data fragments unless it is at a 'suitable' offset from the start of the data TD. Given arbitrary input fragmentation this means that you can't put a link TRB in the middle of a TD. (The documented alignment might be as high as 16kB.) If the rest of the code used a 'ring end pointer' then a link TRB could be used instead. David -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On Wed, 8 Jan 2014, David Laight wrote: > > From: Alan Stern > > > > This may be a foolish question, but why is xhci-hcd using no-op TRBs in > > the first place? > > Because it can't write in a link TRB because other parts of the > code use link TRBs to detect the end of the ring. > > The problem is that it can't put a link TRB in the middle of > a chain of data fragments unless it is at a 'suitable' offset > from the start of the data TD. Given arbitrary input fragmentation > this means that you can't put a link TRB in the middle of a TD. > (The documented alignment might be as high as 16kB.) > > If the rest of the code used a 'ring end pointer' then a link TRB > could be used instead. I see. Sounds like a poor design decision in hindsight. Can it be changed? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On 8 January 2014 17:26, Christoph Hellwig wrote: > > On my laptop SSD I get the following results (sometimes up to 200MB/s, > sometimes down to 100MB/s, always in the 40k to 50k IOps range): > > time elapsed (sec.):5 > bandwidth (MiB/s): 160.00 > IOps: 40960.00 Any direct attached storage I've tried was faster for me as well, indeed. I have already posted IIRC "06:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2208 [Thunderbolt] (rev 05)" - 1Gb BBU RAM sysbench seqwr aio 4k: 326.24Mb/sec 20879.56 Requests/sec That is good that you mentioned SSD. I've tried fnic HBA zoned to EMC XtremIO (SSD only based storage) 14.43Mb/sec 3693.65 Requests/sec for sequential 4k. So far I've seen so massive degradation only in SAN environment. I started my investigation with RHEL6.5 kernel so below table is from it but the trend is the same as for mainline it seems. Chunk size Bandwidth MiB/s 64M512 32M510 16M492 8M 451 4M 436 2M 350 1M 256 512K 191 256K 165 128K 142 64K 101 32K 65 16K 39 8K 20 4K 11 > > The IOps are more than the hardware is physically capable of, but given > that you didn't specify O_SYNC this seems sensible given that we never > have to flush the disk cache. > > Could it be that your array has WCE=0? In Linux we'll never enable the > cache automatically, but Solaris does at least when using ZFS. Try > running: > >sdparm --set=WCE /dev/sdX > > and try again. ZFS is not supporting direct IO so that was UFS. I tried to do sdparm --set=WCE /dev/sdX on the same fnic/XtremIO however this is multipath and for second half of 4 LUNs that failed (probably this is normal) Results have not changed much: 13.317Mb/sec 3409.26 Requests/sec [root@dca-poc-gtsxdb3 mnt]# multipath -ll mpathb (3514f0c5c11a0002d) dm-0 XtremIO,XtremApp size=50G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active |- 0:0:4:1 sdg 8:96 active ready running |- 0:0:5:1 sdh 8:112 active ready running |- 1:0:4:1 sdo 8:224 active ready running `- 1:0:5:1 sdp 8:240 active ready running [root@dca-poc-gtsxdb3 mnt]# sdparm --set=WCE /dev/sdg /dev/sdg: XtremIO XtremApp 1.05 [root@dca-poc-gtsxdb3 mnt]# sdparm --set=WCE /dev/sdh /dev/sdh: XtremIO XtremApp 1.05 [root@dca-poc-gtsxdb3 mnt]# sdparm --set=WCE /dev/sdo /dev/sdo: XtremIO XtremApp 1.05 mode sense command failed, unit attention change_mode_page: failed fetching page: Caching (SBC) # ^ [root@dca-poc-gtsxdb3 mnt]# sdparm --set=WCE /dev/sdp /dev/sdp: XtremIO XtremApp 1.05 mode sense command failed, unit attention change_mode_page: failed fetching page: Caching (SBC) # ^ [root@dca-poc-gtsxdb3 mnt]# -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
> From: Alan Stern > On Wed, 8 Jan 2014, David Laight wrote: > > > > From: Alan Stern > > > > > > This may be a foolish question, but why is xhci-hcd using no-op TRBs in > > > the first place? > > > > Because it can't write in a link TRB because other parts of the > > code use link TRBs to detect the end of the ring. > > > > The problem is that it can't put a link TRB in the middle of > > a chain of data fragments unless it is at a 'suitable' offset > > from the start of the data TD. Given arbitrary input fragmentation > > this means that you can't put a link TRB in the middle of a TD. > > (The documented alignment might be as high as 16kB.) > > > > If the rest of the code used a 'ring end pointer' then a link TRB > > could be used instead. > > I see. Sounds like a poor design decision in hindsight. Can it be > changed? Anything can be changed :-) But it is a bit pervasive. Padding out with nops isolated the change. David -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
On Tue, 7 Jan 2014, Phillip Susi wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > On 01/07/2014 02:18 PM, Alan Stern wrote: > > I don't know how you manually spin down your disk, but one > > reasonable way to do it is to set the autosuspend timeout to 0. If > > you use this approach and apply my patch, the disk should remain > > spun down when the system resumes. > > The traditional method before the recently added block pm feature was > with hdparm -y. A real problem here is that hdparm does not interact with the runtime PM part of the kernel. They work independently and can interfere with each other. > > Right. If you hadn't put the whole system to sleep, the disk > > would have gone into runtime suspend 2 minutes later (assuming you > > didn't use it in the meantime). Whereas since you did put the > > whole system to sleep, the disk will go into runtime suspend 5 > > minutes after the system wakes up -- not 2 minutes later. I.e., > > the timer has been reset from 2 minutes back to 5. > > What resets the timer? It should only be reset when IO is done. And I/O _was_ done. To spin up the disk requires sending it a START command. That's I/O. Okay, you may not think that is a valid argument. Still, the timer has to be set to _something_, and during the resume we don't know how much time was remaining as of the suspend. (I suppose we could store this information when the suspend occurs, but currently we don't.) > yes, it is exactly that wakeup and turn around and suspend again that > I'm trying to avoid; it puts unnecessary wear and tear on the disk. In essence, you want your disk's state to move directly from unpowered (system asleep) to runtime-suspended with no intermediate active -- but only if the disk doesn't spin up all by itself. One problem: How can the kernel tell whether the disk has spun up by itself? I don't think it can be done at the SCSI level. Can it be done at the ATA level (and without using the SCSI stack, as that would require the disk to go out of runtime suspend and spin up anyway)? > > How is the kernel supposed to know that you finished using the > > disk before suspending the system? After all, by setting the > > autosuspend timeout to 5 minutes, you have already told the kernel > > to keep the disk spun up if it has been used any time in the last 5 > > minutes, and you used it only 3 minutes ago. As far as the kernel > > can tell, you still want the disk to be spun up and ready for use. > > That remains true at the time of the system resume. > > That's the whole point: it doesn't know. What it does know is that > the disk is spun down on resume, and it has no reason to believe that > you will need it again soon. Actually, it does have a reason. Namely, you told it earlier to assume the disk will be needed again if it was used within the last 5 minutes. At the time of the system resume, the disk was last used 3 minutes ago (not counting the time the system was asleep). > This is especially true given that your > typical single ATA disk system will spin up the drive on its own > anyhow, so on systems with multiple scsi or ATA disks that explicitly > have been set to power up in standby, it is at least an even bet that > you won't be touching them all soon. Are you assuming these disks were all runtime-active before the system sleep? If so, the kernel is justified in assuming that you will be touching them all soon. > > You're forgetting that the same sd driver manages other types of > > devices than disk drives. Card readers and USB flash drives have > > no heads to retract and no spinning platters. They don't need > > START STOP commands (in fact, some of them probably would crash if > > they received such a command). > > They are irrelevant since they ignore the command anyhow. Not the ones that crash when they receive it. But this point is moot since manage_start_stop doesn't do what you want anyway. > > Now I'm really getting confused. Here, you are saying that ATA > > disks will always spin up, all by themselves, whenever the system > > resumes, and there's nothing the kernel can do to prevent it. But > > earlier you wrote: > > No, I'm saying some ( most in fact ) do, and some do not. Both cases > need to be handled. Runtime pm must not indicate the disk is > suspended when it spins up on its own after power on, which ATA disks > do by default. Oddly, the SCSI standard says that SCSI disks can be > configured behave this way as well rather than requiring an explicit > start command, though it doesn't say how and I've not heard of a way > to do this. I suppose this also applies to the USB flash drives and > SD card readers you mentioned that are managed by sd. No, it doesn't. They don't spin, so they can't be configured either to spin up or not to spin up after power-on. > > Isn't this a contradiction? Or are you making a distinction > > between ATA and non-ATA disks? > > What? I have some disk drive
Re: REQ_PM vs REQ_TYPE_PM_RESUME
On Wed, 8 Jan 2014, Alan Stern wrote: > In essence, you want your disk's state to move directly from unpowered > (system asleep) to runtime-suspended with no intermediate active -- but > only if the disk doesn't spin up all by itself. > In the end, it sounds like what you want can be done fairly easily. It turns out there is a snag. The PM core is set up such that during system sleep transitions, there's no way to tell whether or not userspace has permitted runtime PM for a device. (This is because the core wants to prevent untimely runtime suspends from occurring in the middle of a system sleep transition, and to do so it uses the same mechanism as it does when userspace disallows runtime suspend.) As a result, we must not go directly from unpowered to runtime-suspended. Not without some sort of change to the PM core. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 1/8/2014 2:00 AM, Aaron Lu wrote: > Then I would say we just keep the pm_request_resume call in their > system resume callback. For drives that have that cool feature > turned on, it will be in runtime active state while it's actually > in standby mode, not No, it won't since the runtime resume starts the drive. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSzadoAAoJEI5FoCIzSKrwQWsIAKt/F+eiYYrxxjx58hFCwLxV 3E+CDiuf+pTyUUqPFgcGjyNrfoBO2nWXiCyg5XZ//pjPwbD4GenwSajgUIvKRw/6 0mDw4bb4CyU9/vqVKwHUHfM2jgsAuK9VFn+1fSomODy6B9ZYX6pDPr3jw5S7kDko nDaYhBX9L2/AE4oPo2qPLpAPMxiYFD9qIi8LcfW/Ha2Av7AneIyTXEvFZzyeDv2K WoH/KF59snFz9t4wwwfdqu1l1w93b+Tcn2zCpFBaA63TyrJKq4qpx9/W+INgBGol HKEyv1vh5UkaVSvffiFE/2b1lR4n0hL7LVMe1egazAlNPcsjXdBZv1amYJ7NltI= =N8WO -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] advansys: change buildtime warning into runtime error
On Tue, 2013-01-29 at 11:20 +0100, Paul Bolle wrote: > On Mon, 2012-11-05 at 11:58 +0100, Paul Bolle wrote: > > Building advansys.o triggers this warning: > > drivers/scsi/advansys.c:71:2: warning: #warning this driver is still > > not properly converted to the DMA API [-Wcpp] > > > > This warning can be traced back to a patch called "advansys: add warning > > and convert #includes" which was included in v2.6.10. That patch also > > marked the driver as BROKEN. > > > > Commit 4661e3eace2c7b8433476b5bf0ee437ab3c7dfd4 ("[SCSI] advansys > > driver: limp along on x86") enabled this driver for x86-32. And commit > > 9d511a4b29de6764931343d03e493f2e04df0271 ("[SCSI] advansys: Changes to > > work on parisc") enabled this driver for all architectures. But the > > commit explanation stated: > > I haven't removed the #warning yet because virt_to_bus/bus_to_virt are > > only eliminated for narrow boards. Wide boards need more work. > > > > Five years have passed and, apparently, those wide boards still need > > more work. So let's change the buildtime warning into a runtime error, > > only printed for those wide boards. Perhaps that might push the people > > using those wide boards to convert this driver. And for all others > > there's now one less buildtime warning to ignore. > > > > Signed-off-by: Paul Bolle > > --- > > Compile tested only. I don't have any AdvanSys SCSI boards (neither > > narrow nor wide). > > The date of this message suggests I submitted this patch for a warning > seen while building v3.7-rc4. An identical warning can be seen while > building v3.8-rc5. What's the status of my patch? Did anyone find some > time to have a look at it? I've been carrying this patch locally for over a year. Is there any chance of someone trying to remove this buildtime warning? I'm inclined to submit a path to Fedora - my local builds use their .config as a base - to just disable this driver. It seems that would increase my chances of finally shutting down this warning. Paul Bolle -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] advansys: change buildtime warning into runtime error
On Wednesday 08 January 2014 21:05:14 Paul Bolle wrote: > On Tue, 2013-01-29 at 11:20 +0100, Paul Bolle wrote: > > On Mon, 2012-11-05 at 11:58 +0100, Paul Bolle wrote: > > > Building advansys.o triggers this warning: > > > drivers/scsi/advansys.c:71:2: warning: #warning this driver is > > > still not properly converted to the DMA API [-Wcpp] > > > > > > This warning can be traced back to a patch called "advansys: add > > > warning and convert #includes" which was included in v2.6.10. That > > > patch also marked the driver as BROKEN. > > > > > > Commit 4661e3eace2c7b8433476b5bf0ee437ab3c7dfd4 ("[SCSI] advansys > > > driver: limp along on x86") enabled this driver for x86-32. And commit > > > 9d511a4b29de6764931343d03e493f2e04df0271 ("[SCSI] advansys: Changes to > > > work on parisc") enabled this driver for all architectures. But the > > > commit explanation stated: > > > I haven't removed the #warning yet because virt_to_bus/bus_to_virt > > > are only eliminated for narrow boards. Wide boards need more work. > > > > > > Five years have passed and, apparently, those wide boards still need > > > more work. So let's change the buildtime warning into a runtime error, > > > only printed for those wide boards. Perhaps that might push the people > > > using those wide boards to convert this driver. And for all others > > > there's now one less buildtime warning to ignore. > > > > > > Signed-off-by: Paul Bolle > > > --- > > > Compile tested only. I don't have any AdvanSys SCSI boards (neither > > > narrow nor wide). > > > > The date of this message suggests I submitted this patch for a warning > > seen while building v3.7-rc4. An identical warning can be seen while > > building v3.8-rc5. What's the status of my patch? Did anyone find some > > time to have a look at it? > > I've been carrying this patch locally for over a year. Is there any > chance of someone trying to remove this buildtime warning? > > I'm inclined to submit a path to Fedora - my local builds use > their .config as a base - to just disable this driver. It seems that > would increase my chances of finally shutting down this warning. > > > Paul Bolle The driver works fine with narrow boards. Please don't break working drivers because of some stupid warning. -- Ondrej Zary -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 1/8/2014 12:46 PM, Alan Stern wrote: > I/O _was_ done. To spin up the disk requires sending it a START > command. That's I/O. Yes, currently, but that's exactly what I'm trying to get rid of. > In essence, you want your disk's state to move directly from > unpowered (system asleep) to runtime-suspended with no intermediate > active -- but only if the disk doesn't spin up all by itself. > > One problem: How can the kernel tell whether the disk has spun up > by itself? I don't think it can be done at the SCSI level. Can it > be done at the ATA level (and without using the SCSI stack, as that > would require the disk to go out of runtime suspend and spin up > anyway)? You issue a REQUEST SENSE command and that returns status indicating whether the drive is stopped, or in standby. See my patches. One of them fixes the scsi-ata translation to issue the ATA CHECK power command to see if the drive is suspended, as SAT-3 calls for. > Actually, it does have a reason. Namely, you told it earlier to > assume the disk will be needed again if it was used within the last > 5 minutes. At the time of the system resume, the disk was last > used 3 minutes ago (not counting the time the system was asleep). You're looking at it backwards. You aren't telling it to assume the disk will be needed again if it was used within the last 5 minutes; you are telling it that it can assume it *won't* be needed again soon if not used in the last 5 minutes, and therefore, it is ok to shut it down. Also the act of suspending the system is a pretty clear breaking point in general activity. You normally stop your work and quiesce the system before you suspend it. If I finish my work and copy it to my backup drive, then suspend the system, and my wife comes by an hour later to browse the web, she's not going to be touching my backup disk. > Are you assuming these disks were all runtime-active before the > system sleep? If so, the kernel is justified in assuming that you > will be touching them all soon. No it isn't. Absence of evidence is not evidence of absence. I listed several use cases where the drive won't be runtime suspended before the system suspend, yet will not be accessed any time soon after resume. > Not the ones that crash when they receive it. But this point is > moot since manage_start_stop doesn't do what you want anyway. Any that crash when they get it either are already unusable or have a quirk registered to inhibit the command, which would still apply. > No, it doesn't. They don't spin, so they can't be configured > either to spin up or not to spin up after power-on. Which means we shouldn't be setting the runtime status to suspended after system resume... that's the second case. > What is PuiS? Power up in Standby. > In the end, it sounds like what you want can be done fairly easily. > But only if the kernel has a reliable way to tell whether or not a > disk is spinning (or is spinning up). That's why my patches are attempting to do: use REQUEST SENSE to ask the disk if it is spinning up or not. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSzbMqAAoJEI5FoCIzSKrwCYwIAKMwlRovIdfO6rSgZL7pufFG LOV/9apHzqldPPLJ5BPk0A69Ok/TimrQax06HTPyGJ1/MmOrEcjJzb482mJ4ixpx b7DP3De+RCEUifrfIlI/U+fWLHpw4DOYToSy5ZjNZbF/cDz43k85arAwRK/Bkb++ C2F+YfeQyLcR0KjToj8MK9evlGY6oKfEYx9QCjBEgiaXRjIHfu0AuLn7y7AATK6n pFBzgq4cNSalob9I4WdwejzJz/RzqQFujtknB+nAaKhESqLCoclJLSzwNQyuxcRz qT3DckTyTBF7vGmYayaeERnukqWnDEBAQOc8qtOxx2rbYpivNHiqIc8sfrRoaJc= =2kXd -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [SCSI] advansys: change buildtime warning into runtime error
On Wed, 2014-01-08 at 21:15 +0100, Ondrej Zary wrote: > The driver works fine with narrow boards. Please don't break working drivers > because of some stupid warning. My patch changed a buildtime warning for both narrow and wide boards in a runtime error on wide boards only. That wouldn't break the narrow boards. (I happen to own neither a narrow nor a wide board.) In seven years that buildtime warning has not resulted in a fix for wide boards, has it? So at this point that warning is rather pointless. Is anybody even using these wide boards? Paul Bolle -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/14] target: Initial support for DIF Type1+Type3 emulation
From: Nicholas Bellinger Hi MKP & SCSI folks, This series contains initial support for target mode DIF Type1+Type3 emulation within target core, RAMDISK_MCP device backend, and tcm_loop fabric driver. DIF emulation is enabled via a new 'pi_prot_type' device attribute within configfs, which is set after initial device configuration and before target fabric LUN export occurs. The DIF read/write verify emulation has been made generic enough so it can be used by other backend drivers (eg: FILEIO), as well as DIF v2 in the near future. Also note that the majority of the logic has been groked from existing scsi_debug.c code. The current plan is to enable basic support for emulated backends with tcm_loop for v3.14 code, and then move onto IBLOCK backend support (that requires BLOCK layer changes), and then onto other more complex fabric driver support + hardware offloads (iser-target that Sagi + Or are currently working on) and DIF support into KVM guest using virtio-scsi + vhost-scsi. Here is a quick snippet of the code in action with v3.13-rc3: [ 846.774085] scsi8 : TCM_Loopback [ 846.778044] scsi 8:0:1:0: Direct-Access LIO-ORG RAMDISK-MCP 4.0 PQ: 0 ANSI: 5 [ 846.780160] init_tag_map: adjusted depth to 256 [ 846.781309] init_tag_map: adjusted depth to 256 [ 846.782424] init_tag_map: adjusted depth to 256 [ 846.783685] sd 8:0:1:0: Attached scsi generic sg1 type 0 [ 846.783923] sd 8:0:1:0: [sda] Enabling DIF Type 1 protection [ 846.783929] sd 8:0:1:0: [sda] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB) [ 846.784111] sd 8:0:1:0: [sda] Write Protect is off [ 846.784114] sd 8:0:1:0: [sda] Mode Sense: 43 00 00 08 [ 846.784157] sd 8:0:1:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 846.785357] sda: unknown partition table [ 846.795464] sd 8:0:1:0: [sda] Enabling DIX T10-DIF-TYPE1-CRC protection [ 846.797278] sd 8:0:1:0: [sda] DIF application tag size 2 [ 846.798951] sd 8:0:1:0: [sda] Attached SCSI disk [ 901.477786] Entering sd_dif_type1_generate >>> [ 901.477809] SD TYPE1 Generate: sector: 0 guard_tag: 0x5fd1 app_tag: 0x ref_tag: 0 [ 901.477823] SD TYPE1 Generate: sector: 1 guard_tag: 0x41de app_tag: 0x ref_tag: 1 [ 901.477835] SD TYPE1 Generate: sector: 2 guard_tag: 0x27b1 app_tag: 0x ref_tag: 2 [ 901.477848] SD TYPE1 Generate: sector: 3 guard_tag: 0xeb4c app_tag: 0x ref_tag: 3 [ 901.477860] SD TYPE1 Generate: sector: 4 guard_tag: 0x5d81 app_tag: 0x ref_tag: 4 [ 901.477872] SD TYPE1 Generate: sector: 5 guard_tag: 0xda61 app_tag: 0x ref_tag: 5 [ 901.477884] SD TYPE1 Generate: sector: 6 guard_tag: 0x80df app_tag: 0x ref_tag: 6 [ 901.477896] SD TYPE1 Generate: sector: 7 guard_tag: 0x5ede app_tag: 0x ref_tag: 7 [ 901.477940] sd_prep_fn: CDB: 0x2a blk_integrity_rq(rq): 1 [ 901.481576] DIF WRITE sector: 0 guard_tag: 0x5fd1 app_tag: 0x ref_tag: 0 [ 901.481591] DIF WRITE sector: 1 guard_tag: 0x41de app_tag: 0x ref_tag: 1 [ 901.481615] DIF WRITE sector: 2 guard_tag: 0x27b1 app_tag: 0x ref_tag: 2 [ 901.481637] DIF WRITE sector: 3 guard_tag: 0xeb4c app_tag: 0x ref_tag: 3 [ 901.481649] DIF WRITE sector: 4 guard_tag: 0x5d81 app_tag: 0x ref_tag: 4 [ 901.481661] DIF WRITE sector: 5 guard_tag: 0xda61 app_tag: 0x ref_tag: 5 [ 901.481673] DIF WRITE sector: 6 guard_tag: 0x80df app_tag: 0x ref_tag: 6 [ 901.481685] DIF WRITE sector: 7 guard_tag: 0x5ede app_tag: 0x ref_tag: 7 [ 901.522883] sd_prep_fn: CDB: 0x28 blk_integrity_rq(rq): 1 [ 901.528637] DIF READ sector: 0 guard_tag: 0x5fd1 app_tag: 0x ref_tag: 0 [ 901.528655] DIF READ sector: 1 guard_tag: 0x41de app_tag: 0x ref_tag: 1 [ 901.528667] DIF READ sector: 2 guard_tag: 0x27b1 app_tag: 0x ref_tag: 2 [ 901.528679] DIF READ sector: 3 guard_tag: 0xeb4c app_tag: 0x ref_tag: 3 [ 901.528691] DIF READ sector: 4 guard_tag: 0x5d81 app_tag: 0x ref_tag: 4 [ 901.528703] DIF READ sector: 5 guard_tag: 0xda61 app_tag: 0x ref_tag: 5 [ 901.528715] DIF READ sector: 6 guard_tag: 0x80df app_tag: 0x ref_tag: 6 [ 901.528727] DIF READ sector: 7 guard_tag: 0x5ede app_tag: 0x ref_tag: 7 [ 901.528788] Entering bio_integrity_verify sector: 0 >>> [ 901.535807] bio_integrity_verify: bio->bi_idx: 1 bio->bi_vcnt: 1 [ 901.538222] SD TYPE1 Verify sector: 0 guard_tag: 0x5fd1 app_tag: 0x ref_tag: 0 [ 901.538226] SD TYPE1 Verify sector: 1 guard_tag: 0x41de app_tag: 0x ref_tag: 1 [ 901.538230] SD TYPE1 Verify sector: 2 guard_tag: 0x27b1 app_tag: 0x ref_tag: 2 [ 901.538234] SD TYPE1 Verify sector: 3 guard_tag: 0xeb4c app_tag: 0x ref_tag: 3 [ 901.538238] SD TYPE1 Verify sector: 4 guard_tag: 0x5d81 app_tag: 0x ref_tag: 4 [ 901.538241] SD TYPE1 Verify sector: 5 guard_tag: 0xda61 app_tag: 0x ref_tag: 5 [ 901.538245] SD TYPE1 Verify sector: 6 guard_tag: 0x80df app_tag: 0x ref_tag: 6 [ 901.538248] SD TYPE1 Verify sector: 7 guard_tag: 0x5ed
[PATCH 06/14] target/spc: Add protection related bits to INQUIRY EVPD=0x86
From: Nicholas Bellinger This patch updates spc_emulate_evpd_86() (extended INQUIRY) to report GRD_CHK (Guard Check) and REF_CHK (Reference Check) bits when DIF emulation is enabled by the backend device. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_spc.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c index 6b06f01..d99eb95 100644 --- a/drivers/target/target_core_spc.c +++ b/drivers/target/target_core_spc.c @@ -441,6 +441,15 @@ spc_emulate_evpd_86(struct se_cmd *cmd, unsigned char *buf) struct se_device *dev = cmd->se_dev; buf[3] = 0x3c; + /* +* Set GRD_CHK + REF_CHK for TYPE1 protection, or GRD_CHK +* only for TYPE3 protection. +*/ + if (dev->dev_attrib.pi_prot_type == TARGET_DIF_TYPE1_PROT) + buf[4] = 0x5; + else if (dev->dev_attrib.pi_prot_type == TARGET_DIF_TYPE3_PROT) + buf[4] = 0x4; + /* Set HEADSUP, ORDSUP, SIMPSUP */ buf[5] = 0x07; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/14] target: Add protection SGLs to target_submit_cmd_map_sgls
From: Nicholas Bellinger This patch adds support to target_submit_cmd_map_sgls() for accepting 'sgl_prot' + 'sgl_prot_count' parameters for DIF protection information. Note the passed parameters are stored at se_cmd->t_prot_sg and se_cmd->t_prot_nents respectively. Also, update tcm_loop and vhost-scsi fabrics usage of target_submit_cmd_map_sgls() to take into account the new parameters. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/loopback/tcm_loop.c |2 +- drivers/target/target_core_transport.c | 16 ++-- drivers/vhost/scsi.c |2 +- include/target/target_core_fabric.h|3 ++- 4 files changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c index 1b41e67..c6f1427 100644 --- a/drivers/target/loopback/tcm_loop.c +++ b/drivers/target/loopback/tcm_loop.c @@ -217,7 +217,7 @@ static void tcm_loop_submission_work(struct work_struct *work) scsi_bufflen(sc), tcm_loop_sam_attr(sc), sc->sc_data_direction, 0, scsi_sglist(sc), scsi_sg_count(sc), - sgl_bidi, sgl_bidi_count); + sgl_bidi, sgl_bidi_count, NULL, 0); if (rc < 0) { set_host_byte(sc, DID_NO_CONNECT); goto out_done; diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 707ee17..65b42b4 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1284,6 +1284,8 @@ transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl, * @sgl_count: scatterlist count for unidirectional mapping * @sgl_bidi: struct scatterlist memory for bidirectional READ mapping * @sgl_bidi_count: scatterlist count for bidirectional READ mapping + * @sgl_prot: struct scatterlist memory protection information + * @sgl_prot_count: scatterlist count for protection information * * Returns non zero to signal active I/O shutdown failure. All other * setup exceptions will be returned as a SCSI CHECK_CONDITION response, @@ -1296,7 +1298,8 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess unsigned char *cdb, unsigned char *sense, u32 unpacked_lun, u32 data_length, int task_attr, int data_dir, int flags, struct scatterlist *sgl, u32 sgl_count, - struct scatterlist *sgl_bidi, u32 sgl_bidi_count) + struct scatterlist *sgl_bidi, u32 sgl_bidi_count, + struct scatterlist *sgl_prot, u32 sgl_prot_count) { struct se_portal_group *se_tpg; sense_reason_t rc; @@ -1338,6 +1341,14 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess target_put_sess_cmd(se_sess, se_cmd); return 0; } + /* +* Save pointers for SGLs containing protection information, +* if present. +*/ + if (sgl_prot_count) { + se_cmd->t_prot_sg = sgl_prot; + se_cmd->t_prot_nents = sgl_prot_count; + } rc = target_setup_cmd_from_cdb(se_cmd, cdb); if (rc != 0) { @@ -1380,6 +1391,7 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess return 0; } } + /* * Check if we need to delay processing because of ALUA * Active/NonOptimized primary access state.. @@ -1419,7 +1431,7 @@ int target_submit_cmd(struct se_cmd *se_cmd, struct se_session *se_sess, { return target_submit_cmd_map_sgls(se_cmd, se_sess, cdb, sense, unpacked_lun, data_length, task_attr, data_dir, - flags, NULL, 0, NULL, 0); + flags, NULL, 0, NULL, 0, NULL, 0); } EXPORT_SYMBOL(target_submit_cmd); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index f175629..84488a8 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -889,7 +889,7 @@ static void tcm_vhost_submission_work(struct work_struct *work) cmd->tvc_lun, cmd->tvc_exp_data_len, cmd->tvc_task_attr, cmd->tvc_data_direction, TARGET_SCF_ACK_KREF, sg_ptr, cmd->tvc_sgl_count, - sg_bidi_ptr, sg_no_bidi); + sg_bidi_ptr, sg_no_bidi, NULL, 0); if (rc < 0) { transport_send_check_condition_and_sense(se_cmd, TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0); diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h index 4cf4fda..0218d68 100644 --- a/include/target/target_core_fabric.h +++ b/include/target/target_core_fabric.h @@ -105,7 +105,8 @@ sense_rea
[PATCH 12/14] target/rd: Add support for protection SGL setup + release
From: Nicholas Bellinger This patch adds rd_build_prot_space() + rd_release_prot_space() logic to setup + release protection information scatterlists. It also adds rd_init_prot() + rd_free_prot() se_subsystem_api callbacks used by target core code for setup + release of protection information. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_rd.c | 81 +++ drivers/target/target_core_rd.h |4 ++ 2 files changed, 85 insertions(+) diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index fdc5022..dd99844 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -226,6 +226,64 @@ static int rd_build_device_space(struct rd_dev *rd_dev) return 0; } +static void rd_release_prot_space(struct rd_dev *rd_dev) +{ + struct rd_dev_sg_table *sg_table; + u32 page_count; + + if (!rd_dev->sg_prot_array || !rd_dev->sg_prot_count) + return; + + sg_table = rd_dev->sg_prot_array; + + page_count = rd_release_sgl_table(rd_dev, rd_dev->sg_prot_array, + rd_dev->sg_prot_count); + + pr_debug("CORE_RD[%u] - Released protection space for Ramdisk" +" Device ID: %u, pages %u in %u tables total bytes %lu\n", +rd_dev->rd_host->rd_host_id, rd_dev->rd_dev_id, page_count, +rd_dev->sg_table_count, (unsigned long)page_count * PAGE_SIZE); + + rd_dev->sg_prot_array = NULL; + rd_dev->sg_prot_count = 0; +} + +static int rd_build_prot_space(struct rd_dev *rd_dev, int prot_length) +{ + struct rd_dev_sg_table *sg_table; + u32 total_sg_needed, sg_tables; + u32 max_sg_per_table = (RD_MAX_ALLOCATION_SIZE / + sizeof(struct scatterlist)); + int rc; + + if (rd_dev->rd_flags & RDF_NULLIO) + return 0; + + total_sg_needed = rd_dev->rd_page_count / prot_length; + + sg_tables = (total_sg_needed / max_sg_per_table) + 1; + + sg_table = kzalloc(sg_tables * sizeof(struct rd_dev_sg_table), GFP_KERNEL); + if (!sg_table) { + pr_err("Unable to allocate memory for Ramdisk protection" + " scatterlist tables\n"); + return -ENOMEM; + } + + rd_dev->sg_prot_array = sg_table; + rd_dev->sg_prot_count = sg_tables; + + rc = rd_allocate_sgl_table(rd_dev, sg_table, total_sg_needed, 0xff); + if (rc) + return rc; + + pr_debug("CORE_RD[%u] - Built Ramdisk Device ID: %u prot space of" +" %u pages in %u tables\n", rd_dev->rd_host->rd_host_id, +rd_dev->rd_dev_id, total_sg_needed, rd_dev->sg_prot_count); + + return 0; +} + static struct se_device *rd_alloc_device(struct se_hba *hba, const char *name) { struct rd_dev *rd_dev; @@ -280,6 +338,7 @@ static void rd_free_device(struct se_device *dev) { struct rd_dev *rd_dev = RD_DEV(dev); + rd_release_prot_space(rd_dev); rd_release_device_space(rd_dev); kfree(rd_dev); } @@ -482,6 +541,26 @@ static sector_t rd_get_blocks(struct se_device *dev) return blocks_long; } +static int rd_init_prot(struct se_device *dev) +{ + struct rd_dev *rd_dev = RD_DEV(dev); + +if (!dev->dev_attrib.pi_prot_type) + return 0; + + return rd_build_prot_space(rd_dev, dev->prot_length); +} + +static void rd_free_prot(struct se_device *dev) +{ + struct rd_dev *rd_dev = RD_DEV(dev); + + if (dev->dev_attrib.pi_prot_type) + return; + + rd_release_prot_space(rd_dev); +} + static struct sbc_ops rd_sbc_ops = { .execute_rw = rd_execute_rw, }; @@ -507,6 +586,8 @@ static struct se_subsystem_api rd_mcp_template = { .show_configfs_dev_params = rd_show_configfs_dev_params, .get_device_type= sbc_get_device_type, .get_blocks = rd_get_blocks, + .init_prot = rd_init_prot, + .free_prot = rd_free_prot, }; int __init rd_module_init(void) diff --git a/drivers/target/target_core_rd.h b/drivers/target/target_core_rd.h index 1789d1e..cc46a6a 100644 --- a/drivers/target/target_core_rd.h +++ b/drivers/target/target_core_rd.h @@ -33,8 +33,12 @@ struct rd_dev { u32 rd_page_count; /* Number of SG tables in sg_table_array */ u32 sg_table_count; + /* Number of SG tables in sg_prot_array */ + u32 sg_prot_count; /* Array of rd_dev_sg_table_t containing scatterlists */ struct rd_dev_sg_table *sg_table_array; + /* Array of rd_dev_sg_table containing protection scatterlists */ + struct rd_dev_sg_table *sg_prot_array; /* Ramdisk HBA device is connected to */ stru
[PATCH 14/14] tcm_loop: Enable DIF/DIX modes in SCSI host LLD
From: Nicholas Bellinger This patch updates tcm_loop_driver_probe() to set protection information using scsi_host_set_prot() and scsi_host_set_guard(), which currently enabled all modes of DIF/DIX protection, minus DIF TYPE0. Also, update tcm_loop_submission_work() to pass struct scsi_cmnd related protection into target_submit_cmd_map_sgls() during CDB dispatch. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/loopback/tcm_loop.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c index c6f1427..8d3355f 100644 --- a/drivers/target/loopback/tcm_loop.c +++ b/drivers/target/loopback/tcm_loop.c @@ -217,7 +217,8 @@ static void tcm_loop_submission_work(struct work_struct *work) scsi_bufflen(sc), tcm_loop_sam_attr(sc), sc->sc_data_direction, 0, scsi_sglist(sc), scsi_sg_count(sc), - sgl_bidi, sgl_bidi_count, NULL, 0); + sgl_bidi, sgl_bidi_count, + scsi_prot_sglist(sc), scsi_prot_sg_count(sc)); if (rc < 0) { set_host_byte(sc, DID_NO_CONNECT); goto out_done; @@ -462,7 +463,7 @@ static int tcm_loop_driver_probe(struct device *dev) { struct tcm_loop_hba *tl_hba; struct Scsi_Host *sh; - int error; + int error, host_prot; tl_hba = to_tcm_loop_hba(dev); @@ -486,6 +487,13 @@ static int tcm_loop_driver_probe(struct device *dev) sh->max_channel = 0; sh->max_cmd_len = TL_SCSI_MAX_CMD_LEN; + host_prot = SHOST_DIF_TYPE1_PROTECTION | SHOST_DIF_TYPE2_PROTECTION | + SHOST_DIF_TYPE3_PROTECTION | SHOST_DIX_TYPE1_PROTECTION | + SHOST_DIX_TYPE2_PROTECTION | SHOST_DIX_TYPE3_PROTECTION; + + scsi_host_set_prot(sh, host_prot); + scsi_host_set_guard(sh, SHOST_DIX_GUARD_CRC); + error = scsi_add_host(sh, &tl_hba->dev); if (error) { pr_err("%s: scsi_add_host failed\n", __func__); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/14] target/rd: Refactor rd_build_device_space + rd_release_device_space
From: Nicholas Bellinger This patch refactors rd_build_device_space() + rd_release_device_space() into rd_allocate_sgl_table() + rd_release_device_space() so that they may be used seperatly for setup + release of protection information scatterlists. Also add explicit memset of pages within rd_allocate_sgl_table() based upon passed 'init_payload' value. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_rd.c | 116 --- 1 file changed, 71 insertions(+), 45 deletions(-) diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index 4ffe5f2..fdc5022 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -78,23 +78,14 @@ static void rd_detach_hba(struct se_hba *hba) hba->hba_ptr = NULL; } -/* rd_release_device_space(): - * - * - */ -static void rd_release_device_space(struct rd_dev *rd_dev) +static u32 rd_release_sgl_table(struct rd_dev *rd_dev, struct rd_dev_sg_table *sg_table, +u32 sg_table_count) { - u32 i, j, page_count = 0, sg_per_table; - struct rd_dev_sg_table *sg_table; struct page *pg; struct scatterlist *sg; + u32 i, j, page_count = 0, sg_per_table; - if (!rd_dev->sg_table_array || !rd_dev->sg_table_count) - return; - - sg_table = rd_dev->sg_table_array; - - for (i = 0; i < rd_dev->sg_table_count; i++) { + for (i = 0; i < sg_table_count; i++) { sg = sg_table[i].sg_table; sg_per_table = sg_table[i].rd_sg_count; @@ -105,16 +96,31 @@ static void rd_release_device_space(struct rd_dev *rd_dev) page_count++; } } - kfree(sg); } + kfree(sg_table); + return page_count; +} + +static void rd_release_device_space(struct rd_dev *rd_dev) +{ + struct rd_dev_sg_table *sg_table; + u32 page_count; + + if (!rd_dev->sg_table_array || !rd_dev->sg_table_count) + return; + + sg_table = rd_dev->sg_table_array; + + page_count = rd_release_sgl_table(rd_dev, rd_dev->sg_table_array, + rd_dev->sg_table_count); + pr_debug("CORE_RD[%u] - Released device space for Ramdisk" " Device ID: %u, pages %u in %u tables total bytes %lu\n", rd_dev->rd_host->rd_host_id, rd_dev->rd_dev_id, page_count, rd_dev->sg_table_count, (unsigned long)page_count * PAGE_SIZE); - kfree(sg_table); rd_dev->sg_table_array = NULL; rd_dev->sg_table_count = 0; } @@ -124,38 +130,15 @@ static void rd_release_device_space(struct rd_dev *rd_dev) * * */ -static int rd_build_device_space(struct rd_dev *rd_dev) +static int rd_allocate_sgl_table(struct rd_dev *rd_dev, struct rd_dev_sg_table *sg_table, +u32 total_sg_needed, unsigned char init_payload) { - u32 i = 0, j, page_offset = 0, sg_per_table, sg_tables, total_sg_needed; + u32 i = 0, j, page_offset = 0, sg_per_table; u32 max_sg_per_table = (RD_MAX_ALLOCATION_SIZE / sizeof(struct scatterlist)); - struct rd_dev_sg_table *sg_table; struct page *pg; struct scatterlist *sg; - - if (rd_dev->rd_page_count <= 0) { - pr_err("Illegal page count: %u for Ramdisk device\n", - rd_dev->rd_page_count); - return -EINVAL; - } - - /* Don't need backing pages for NULLIO */ - if (rd_dev->rd_flags & RDF_NULLIO) - return 0; - - total_sg_needed = rd_dev->rd_page_count; - - sg_tables = (total_sg_needed / max_sg_per_table) + 1; - - sg_table = kzalloc(sg_tables * sizeof(struct rd_dev_sg_table), GFP_KERNEL); - if (!sg_table) { - pr_err("Unable to allocate memory for Ramdisk" - " scatterlist tables\n"); - return -ENOMEM; - } - - rd_dev->sg_table_array = sg_table; - rd_dev->sg_table_count = sg_tables; + unsigned char *p; while (total_sg_needed) { sg_per_table = (total_sg_needed > max_sg_per_table) ? @@ -186,16 +169,59 @@ static int rd_build_device_space(struct rd_dev *rd_dev) } sg_assign_page(&sg[j], pg); sg[j].length = PAGE_SIZE; + + p = kmap(pg); + memset(p, init_payload, PAGE_SIZE); + kunmap(pg); } page_offset += sg_per_table; total_sg_needed -= sg_per_table; } + return 0; +} + +static int rd_build_device_space(struct rd_dev *rd_dev) +{ + struct rd_dev_sg_table *sg_table; + u3
[PATCH 09/14] target/configfs: Expose protection device attributes
From: Nicholas Bellinger This patch adds support for exposing DIF protection device attributes via configfs. This includes: pi_prot_type: Protection Type (0, 1, 3 currently support) pi_prot_version: Protection Version (DIF v1 currently supported) pi_guard_type: Guard Type (1=DIF CRC, 2=IP CRC) Within se_dev_set_pi_prot_type() it also adds the se_subsystem_api device callbacks to setup per device protection information. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_configfs.c | 12 ++ drivers/target/target_core_device.c | 65 + drivers/target/target_core_internal.h |2 + 3 files changed, 79 insertions(+) diff --git a/drivers/target/target_core_configfs.c b/drivers/target/target_core_configfs.c index 272755d..0f1101c 100644 --- a/drivers/target/target_core_configfs.c +++ b/drivers/target/target_core_configfs.c @@ -643,6 +643,15 @@ SE_DEV_ATTR(emulate_caw, S_IRUGO | S_IWUSR); DEF_DEV_ATTRIB(emulate_3pc); SE_DEV_ATTR(emulate_3pc, S_IRUGO | S_IWUSR); +DEF_DEV_ATTRIB(pi_prot_type); +SE_DEV_ATTR(pi_prot_type, S_IRUGO | S_IWUSR); + +DEF_DEV_ATTRIB_RO(pi_prot_version); +SE_DEV_ATTR_RO(pi_prot_version); + +DEF_DEV_ATTRIB(pi_guard_type); +SE_DEV_ATTR(pi_guard_type, S_IRUGO | S_IWUSR); + DEF_DEV_ATTRIB(enforce_pr_isids); SE_DEV_ATTR(enforce_pr_isids, S_IRUGO | S_IWUSR); @@ -702,6 +711,9 @@ static struct configfs_attribute *target_core_dev_attrib_attrs[] = { &target_core_dev_attrib_emulate_tpws.attr, &target_core_dev_attrib_emulate_caw.attr, &target_core_dev_attrib_emulate_3pc.attr, + &target_core_dev_attrib_pi_prot_type.attr, + &target_core_dev_attrib_pi_prot_version.attr, + &target_core_dev_attrib_pi_guard_type.attr, &target_core_dev_attrib_enforce_pr_isids.attr, &target_core_dev_attrib_is_nonrot.attr, &target_core_dev_attrib_emulate_rest_reord.attr, diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 207b340..2b59beb 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -918,6 +918,67 @@ int se_dev_set_emulate_3pc(struct se_device *dev, int flag) return 0; } +int se_dev_set_pi_prot_type(struct se_device *dev, int flag) +{ + int rc, old_prot = dev->dev_attrib.pi_prot_type; + + if (flag != 0 && flag != 1 && flag != 2 && flag != 3) { + pr_err("Illegal value %d for pi_prot_type\n", flag); + return -EINVAL; + } + if (flag == 2) { + pr_err("DIF TYPE2 protection currently not supported\n"); + return -ENOSYS; + } + if (!dev->transport->init_prot || !dev->transport->free_prot) { + pr_err("DIF protection not supported by backend: %s\n", + dev->transport->name); + return -ENOSYS; + } + if (!(dev->dev_flags & DF_CONFIGURED)) { + pr_err("DIF protection requires device to be configured\n"); + return -ENODEV; + } + if (dev->export_count) { + pr_err("dev[%p]: Unable to change SE Device PROT type while" + " export_count is %d\n", dev, dev->export_count); + return -EINVAL; + } + + dev->dev_attrib.pi_prot_type = flag; + + if (flag && !old_prot) { + rc = dev->transport->init_prot(dev); + if (rc) { + dev->dev_attrib.pi_prot_type = old_prot; + return rc; + } + } else if (!flag && old_prot) { + dev->transport->free_prot(dev); + } + pr_debug("dev[%p]: SE Device Protection Type: %d\n", dev, flag); + + return 0; +} + +int se_dev_set_pi_guard_type(struct se_device *dev, int flag) +{ + if (flag != 1 && flag != 2) { + pr_err("Illegal value %d for pi_guard_type\n", flag); + return -EINVAL; + } + if (dev->export_count) { + pr_err("dev[%p]: Unable to change SE Device GUARD type while" + " export_count is %d\n", dev, dev->export_count); + return -EINVAL; + } + + dev->dev_attrib.pi_guard_type = flag; + pr_debug("dev[%p]: SE Device Guard Type: %d\n", dev, flag); + + return 0; +} + int se_dev_set_enforce_pr_isids(struct se_device *dev, int flag) { if ((flag != 0) && (flag != 1)) { @@ -1415,6 +1476,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name) dev->dev_link_magic = SE_DEV_LINK_MAGIC; dev->se_hba = hba; dev->transport = hba->transport; + dev->prot_length = sizeof(struct se_dif_v1_tuple); INIT_LIST_HEAD(&dev->dev_list); INIT_LIST_HEAD(&dev->dev_sep_list); @@ -1455,6 +1517,9 @@ struct se_device *target_alloc_device(struct
[PATCH 13/14] target/rd: Add DIF protection into rd_execute_rw
From: Nicholas Bellinger This patch adds support for DIF protection into rd_execute_rw() code for WRITE/READ I/O using sbc_dif_verify_[write,read]() logic. It also adds rd_get_prot_table() for locating protection SGLs assoicated with the ramdisk backend device. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_rd.c | 67 +++ 1 file changed, 67 insertions(+) diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index dd99844..3fd51eb 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -363,6 +363,26 @@ static struct rd_dev_sg_table *rd_get_sg_table(struct rd_dev *rd_dev, u32 page) return NULL; } +static struct rd_dev_sg_table *rd_get_prot_table(struct rd_dev *rd_dev, u32 page) +{ + struct rd_dev_sg_table *sg_table; + u32 i, sg_per_table = (RD_MAX_ALLOCATION_SIZE / + sizeof(struct scatterlist)); + + i = page / sg_per_table; + if (i < rd_dev->sg_prot_count) { + sg_table = &rd_dev->sg_prot_array[i]; + if ((sg_table->page_start_offset <= page) && +(sg_table->page_end_offset >= page)) + return sg_table; + } + + pr_err("Unable to locate struct prot rd_dev_sg_table for page: %u\n", + page); + + return NULL; +} + static sense_reason_t rd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, enum dma_data_direction data_direction) @@ -377,6 +397,7 @@ rd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, u32 rd_page; u32 src_len; u64 tmp; + sense_reason_t rc; if (dev->rd_flags & RDF_NULLIO) { target_complete_cmd(cmd, SAM_STAT_GOOD); @@ -399,6 +420,29 @@ rd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, data_direction == DMA_FROM_DEVICE ? "Read" : "Write", cmd->t_task_lba, rd_size, rd_page, rd_offset); + if ((cmd->se_cmd_flags & SCF_PROT) && se_dev->dev_attrib.pi_prot_type && + data_direction == DMA_TO_DEVICE) { + sector_t sector = cmd->data_length / se_dev->dev_attrib.block_size; + struct rd_dev_sg_table *prot_table; + struct scatterlist *prot_sg; + u32 prot_offset, prot_page; + + tmp = cmd->t_task_lba * se_dev->prot_length; + prot_offset = do_div(tmp, PAGE_SIZE); + prot_page = tmp; + + prot_table = rd_get_prot_table(dev, prot_page); + if (!prot_table) + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + + prot_sg = &prot_table->sg_table[prot_page - prot_table->page_start_offset]; + + rc = sbc_dif_verify_write(cmd, cmd->t_task_lba, sector, 0, + prot_sg, prot_offset); + if (rc) + return rc; + } + src_len = PAGE_SIZE - rd_offset; sg_miter_start(&m, sgl, sgl_nents, data_direction == DMA_FROM_DEVICE ? @@ -460,6 +504,29 @@ rd_execute_rw(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, } sg_miter_stop(&m); + if ((cmd->se_cmd_flags & SCF_PROT) && se_dev->dev_attrib.pi_prot_type && + data_direction == DMA_FROM_DEVICE) { + sector_t sector = cmd->data_length / se_dev->dev_attrib.block_size; + struct rd_dev_sg_table *prot_table; + struct scatterlist *prot_sg; + u32 prot_offset, prot_page; + + tmp = cmd->t_task_lba * se_dev->prot_length; + prot_offset = do_div(tmp, PAGE_SIZE); + prot_page = tmp; + + prot_table = rd_get_prot_table(dev, prot_page); + if (!prot_table) + return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; + + prot_sg = &prot_table->sg_table[prot_page - prot_table->page_start_offset]; + + rc = sbc_dif_verify_read(cmd, cmd->t_task_lba, sector, 0, +prot_sg, prot_offset); + if (rc) + return rc; + } + target_complete_cmd(cmd, SAM_STAT_GOOD); return 0; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/14] target/sbc: Add P_TYPE + PROT_EN bits to READ_CAPACITY_16
From: Nicholas Bellinger This patch updates sbc_emulate_readcapacity_16() to set P_TYPE and PROT_EN bits when DIF emulation is enabled by the backend device. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_sbc.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 366b9bb..22599e8 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -106,6 +106,11 @@ sbc_emulate_readcapacity_16(struct se_cmd *cmd) buf[9] = (dev->dev_attrib.block_size >> 16) & 0xff; buf[10] = (dev->dev_attrib.block_size >> 8) & 0xff; buf[11] = dev->dev_attrib.block_size & 0xff; + /* +* Set P_TYPE and PROT_EN bits for DIF support +*/ + if (dev->dev_attrib.pi_prot_type) + buf[12] = (dev->dev_attrib.pi_prot_type - 1) << 1 | 0x1; if (dev->transport->get_lbppbe) buf[13] = dev->transport->get_lbppbe(dev) & 0x0f; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/14] target/spc: Expose ATO bit in control mode page
From: Nicholas Bellinger This patch updates spc_modesense_control() to set the Application Tag Owner (ATO) bit when when DIF emulation is enabled by the backend device. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_spc.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c index d99eb95..4360c80 100644 --- a/drivers/target/target_core_spc.c +++ b/drivers/target/target_core_spc.c @@ -799,6 +799,19 @@ static int spc_modesense_control(struct se_device *dev, u8 pc, u8 *p) * status (see SAM-4). */ p[5] = (dev->dev_attrib.emulate_tas) ? 0x40 : 0x00; + /* +* From spc4r30, section 7.5.7 Control mode page +* +* Application Tag Owner (ATO) bit set to one. +* +* If the ATO bit is set to one the device server shall not modify the +* LOGICAL BLOCK APPLICATION TAG field and, depending on the protection +* type, shall not modify the contents of the LOGICAL BLOCK REFERENCE +* TAG field. +*/ + if (dev->dev_attrib.pi_prot_type) + p[5] |= 0x80; + p[8] = 0xff; p[9] = 0xff; p[11] = 30; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/14] target/spc: Add protection bit to standard INQUIRY output
From: Nicholas Bellinger This patch updates spc_emulate_inquiry_std() to set the PROTECT bit when DIF emulation is enabled by the backend device. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_spc.c |5 + 1 file changed, 5 insertions(+) diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c index 021c3f4..6b06f01 100644 --- a/drivers/target/target_core_spc.c +++ b/drivers/target/target_core_spc.c @@ -100,6 +100,11 @@ spc_emulate_inquiry_std(struct se_cmd *cmd, unsigned char *buf) */ if (dev->dev_attrib.emulate_3pc) buf[5] |= 0x8; + /* +* Set Protection (PROTECT) bit when DIF has been enabled. +*/ + if (dev->dev_attrib.pi_prot_type) + buf[5] |= 0x1; buf[7] = 0x2; /* CmdQue=1 */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/14] target: Add DIF CHECK_CONDITION ASC/ASCQ exception cases
From: Nicholas Bellinger This patch adds support for DIF related CHECK_CONDITION ASC/ASCQ exception cases into transport_send_check_condition_and_sense(). This includes: LOGICAL BLOCK GUARD CHECK FAILED LOGICAL BLOCK APPLICATION TAG CHECK FAILED LOGICAL BLOCK REFERENCE TAG CHECK FAILED that used by DIF TYPE1 and TYPE3 failure cases. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_transport.c | 30 ++ include/target/target_core_base.h |3 +++ 2 files changed, 33 insertions(+) diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 91953da..707ee17 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -2648,6 +2648,36 @@ transport_send_check_condition_and_sense(struct se_cmd *cmd, buffer[SPC_ASC_KEY_OFFSET] = 0x1d; buffer[SPC_ASCQ_KEY_OFFSET] = 0x00; break; + case TCM_LOGICAL_BLOCK_GUARD_CHECK_FAILED: + /* CURRENT ERROR */ + buffer[0] = 0x70; + buffer[SPC_ADD_SENSE_LEN_OFFSET] = 10; + /* ILLEGAL REQUEST */ + buffer[SPC_SENSE_KEY_OFFSET] = ILLEGAL_REQUEST; + /* LOGICAL BLOCK GUARD CHECK FAILED */ + buffer[SPC_ASC_KEY_OFFSET] = 0x10; + buffer[SPC_ASCQ_KEY_OFFSET] = 0x01; + break; + case TCM_LOGICAL_BLOCK_APP_TAG_CHECK_FAILED: + /* CURRENT ERROR */ + buffer[0] = 0x70; + buffer[SPC_ADD_SENSE_LEN_OFFSET] = 10; + /* ILLEGAL REQUEST */ + buffer[SPC_SENSE_KEY_OFFSET] = ILLEGAL_REQUEST; + /* LOGICAL BLOCK APPLICATION TAG CHECK FAILED */ + buffer[SPC_ASC_KEY_OFFSET] = 0x10; + buffer[SPC_ASCQ_KEY_OFFSET] = 0x02; + break; + case TCM_LOGICAL_BLOCK_REF_TAG_CHECK_FAILED: + /* CURRENT ERROR */ + buffer[0] = 0x70; + buffer[SPC_ADD_SENSE_LEN_OFFSET] = 10; + /* ILLEGAL REQUEST */ + buffer[SPC_SENSE_KEY_OFFSET] = ILLEGAL_REQUEST; + /* LOGICAL BLOCK REFERENCE TAG CHECK FAILED */ + buffer[SPC_ASC_KEY_OFFSET] = 0x10; + buffer[SPC_ASCQ_KEY_OFFSET] = 0x03; + break; case TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE: default: /* CURRENT ERROR */ diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 15f402c..9a6e091 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -206,6 +206,9 @@ enum tcm_sense_reason_table { TCM_OUT_OF_RESOURCES= R(0x12), TCM_PARAMETER_LIST_LENGTH_ERROR = R(0x13), TCM_MISCOMPARE_VERIFY = R(0x14), + TCM_LOGICAL_BLOCK_GUARD_CHECK_FAILED= R(0x15), + TCM_LOGICAL_BLOCK_APP_TAG_CHECK_FAILED = R(0x16), + TCM_LOGICAL_BLOCK_REF_TAG_CHECK_FAILED = R(0x17), #undef R }; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/14] target/sbc: Add DIF TYPE1+TYPE3 read/write verify emulation
From: Nicholas Bellinger This patch adds support for DIF read/write verify emulation for TARGET_DIF_TYPE1_PROT + TARGET_DIF_TYPE3_PROT operation. This includes sbc_dif_verify_write() + sbc_dif_verify_read() calls accessable by backend drivers to perform DIF verify for SGL based data and protection information. Also included is sbc_dif_copy_prot() logic to copy protection information to/from backend provided protection SGLs. Based on scsi_debug.c DIF TYPE1+TYPE3 emulation. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_sbc.c | 180 ++ include/target/target_core_backend.h |4 + 2 files changed, 184 insertions(+) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 600ffcb..366b9bb 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -998,3 +999,182 @@ err: return ret; } EXPORT_SYMBOL(sbc_execute_unmap); + +static sense_reason_t +sbc_dif_v1_verify(struct se_device *dev, struct se_dif_v1_tuple *sdt, + const void *p, sector_t sector, unsigned int ei_lba) +{ + int block_size = dev->dev_attrib.block_size; + __be16 csum; + + if (dev->dev_attrib.pi_guard_type == TARGET_DIX_GUARD_CRC) + csum = cpu_to_be16(crc_t10dif(p, block_size)); + else + csum = (__force __be16)ip_compute_csum(p, block_size); + + if (sdt->guard_tag != csum) { + pr_err("DIFv1 checksum failed on sector %llu guard tag 0x%04x" + " csum 0x%04x\n", (unsigned long long)sector, + be16_to_cpu(sdt->guard_tag), be16_to_cpu(csum)); + return TCM_LOGICAL_BLOCK_GUARD_CHECK_FAILED; + } + + if (dev->dev_attrib.pi_prot_type == TARGET_DIF_TYPE1_PROT && + be32_to_cpu(sdt->ref_tag) != (sector & 0x)) { + pr_err("DIFv1 Type 1 reference failed on sector: %llu tag: 0x%08x" + " sector MSB: 0x%08x\n", (unsigned long long)sector, + be32_to_cpu(sdt->ref_tag), (u32)(sector & 0x)); + return TCM_LOGICAL_BLOCK_REF_TAG_CHECK_FAILED; + } + + if (dev->dev_attrib.pi_prot_type == TARGET_DIF_TYPE2_PROT && + be32_to_cpu(sdt->ref_tag) != ei_lba) { + pr_err("DIFv1 Type 2 reference failed on sector: %llu tag: 0x%08x" + " ei_lba: 0x%08x\n", (unsigned long long)sector, + be32_to_cpu(sdt->ref_tag), ei_lba); + return TCM_LOGICAL_BLOCK_REF_TAG_CHECK_FAILED; + } + + return 0; +} + +static void +sbc_dif_copy_prot(struct se_cmd *cmd, unsigned int sectors, bool read, + struct scatterlist *sg, int sg_off) +{ + struct se_device *dev = cmd->se_dev; + struct scatterlist *psg; + void *paddr, *addr; + unsigned int i, len, left; + + left = sectors * dev->prot_length; + + for_each_sg(cmd->t_prot_sg, psg, cmd->t_prot_nents, i) { + + len = min(psg->length, left); + paddr = kmap_atomic(sg_page(psg)) + psg->offset; + addr = kmap_atomic(sg_page(sg)) + sg_off; + + if (read) + memcpy(paddr, addr, len); + else + memcpy(addr, paddr, len); + + left -= len; + kunmap_atomic(paddr); + kunmap_atomic(addr); + } +} + +sense_reason_t +sbc_dif_verify_write(struct se_cmd *cmd, sector_t start, unsigned int sectors, +unsigned int ei_lba, struct scatterlist *sg, int sg_off) +{ + struct se_device *dev = cmd->se_dev; + struct se_dif_v1_tuple *sdt; + struct scatterlist *dsg, *psg = cmd->t_prot_sg; + sector_t sector = start; + void *daddr, *paddr; + int i, j, offset = 0; + sense_reason_t rc; + + for_each_sg(cmd->t_data_sg, dsg, cmd->t_data_nents, i) { + daddr = kmap_atomic(sg_page(dsg)) + dsg->offset; + paddr = kmap_atomic(sg_page(psg)) + psg->offset; + + for (j = 0; j < dsg->length; j += dev->dev_attrib.block_size) { + + if (offset >= psg->length) { + kunmap_atomic(paddr); + psg = sg_next(psg); + paddr = kmap_atomic(sg_page(psg)) + psg->offset; + offset = 0; + } + + sdt = paddr + offset; + + pr_debug("DIF WRITE sector: %llu guard_tag: 0x%04x" +" app_tag: 0x%04x ref_tag: %u\n", +(unsigned long long)sector, sdt->guard_tag, +
[PATCH 01/14] target: Add DIF related base definitions
From: Nicholas Bellinger This patch adds DIF related definitions to target_core_base.h that includes enums for target_prot_op + target_prot_type + target_prot_version + target_guard_type + target_pi_error. Also included is struct se_dif_v1_tuple, along with changes to struct se_cmd, struct se_dev_attrib, and struct se_device. Also, add new se_subsystem_api->[init,free]_prot() callers used by target core code to setup backend specific protection information after the device has been configured. Enums taken from Sagi Grimberg's original patch. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger target: more defs Signed-off-by: Nicholas Bellinger --- include/target/target_core_backend.h |2 ++ include/target/target_core_base.h| 59 ++ 2 files changed, 61 insertions(+) diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h index 39e0114..930f30d 100644 --- a/include/target/target_core_backend.h +++ b/include/target/target_core_backend.h @@ -41,6 +41,8 @@ struct se_subsystem_api { unsigned int (*get_io_opt)(struct se_device *); unsigned char *(*get_sense_buffer)(struct se_cmd *); bool (*get_write_cache)(struct se_device *); + int (*init_prot)(struct se_device *); + void (*free_prot)(struct se_device *); }; struct sbc_ops { diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 45412a6..15f402c 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -166,6 +166,7 @@ enum se_cmd_flags_table { SCF_COMPARE_AND_WRITE = 0x0008, SCF_COMPARE_AND_WRITE_POST = 0x0010, SCF_CMD_XCOPY_PASSTHROUGH = 0x0020, + SCF_PROT= 0x0040, }; /* struct se_dev_entry->lun_flags and struct se_lun->lun_access */ @@ -414,6 +415,45 @@ struct se_tmr_req { struct list_headtmr_list; }; +enum target_prot_op { + TARGET_PROT_NORMAL, + TARGET_PROT_READ_INSERT, + TARGET_PROT_WRITE_INSERT, + TARGET_PROT_READ_STRIP, + TARGET_PROT_WRITE_STRIP, + TARGET_PROT_READ_PASS, + TARGET_PROT_WRITE_PASS, +}; + +enum target_prot_type { + TARGET_DIF_TYPE0_PROT, + TARGET_DIF_TYPE1_PROT, + TARGET_DIF_TYPE2_PROT, + TARGET_DIF_TYPE3_PROT, +}; + +enum target_prot_version { + TARGET_DIF_V1 = 1, + TARGET_DIF_V2 = 2, +}; + +enum target_guard_type { + TARGET_DIX_GUARD_CRC = 1, + TARGET_DIX_GUARD_IP = 2, +}; + +enum target_pi_error { + TARGET_GUARD_CHECK_FAILED = 0x1, + TARGET_APPTAG_CHECK_FAILED = 0x2, + TARGET_REFTAG_CHECK_FAILED = 0x3, +}; + +struct se_dif_v1_tuple { + __be16 guard_tag; + __be16 app_tag; + __be32 ref_tag; +}; + struct se_cmd { /* SAM response code being sent to initiator */ u8 scsi_status; @@ -498,6 +538,20 @@ struct se_cmd { /* Used for lun->lun_ref counting */ boollun_ref_active; + + /* DIF related members */ + enum target_prot_op prot_op; + enum target_prot_type prot_type; + enum target_guard_type bg_type; + u16 bg_seed; + u16 reftag_seed; + u32 apptag_seed; + u32 prot_length; + struct scatterlist *t_prot_sg; + unsigned intt_prot_nents; + boolprot_interleaved; + enum target_pi_errorpi_err; + u32 block_num; }; struct se_ua { @@ -609,6 +663,9 @@ struct se_dev_attrib { int emulate_tpws; int emulate_caw; int emulate_3pc; + enum target_prot_type pi_prot_type; + enum target_prot_version pi_prot_version; + enum target_guard_type pi_guard_type; int enforce_pr_isids; int is_nonrot; int emulate_rest_reord; @@ -739,6 +796,8 @@ struct se_device { /* Linked list for struct se_hba struct se_device list */ struct list_headdev_list; struct se_lun xcopy_lun; + /* Protection Information */ + int prot_length; }; struct se_hba { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/14] target/sbc: Add sbc_check_prot + update sbc_parse_cdb for DIF
From: Nicholas Bellinger This patch adds sbc_check_prot() for performing various DIF related CDB sanity checks, along with setting SCF_PROT once sanity checks have passed. Also, add calls in sbc_parse_cdb() for READ_[10,12,16] + WRITE_[10,12,16] to perform DIF sanity checking. Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Or Gerlitz Signed-off-by: Nicholas Bellinger --- drivers/target/target_core_sbc.c | 39 ++ 1 file changed, 39 insertions(+) diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index 52ae54e..600ffcb 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -563,6 +563,27 @@ sbc_compare_and_write(struct se_cmd *cmd) return TCM_NO_SENSE; } +bool +sbc_check_prot(struct se_device *dev, struct se_cmd *cmd, unsigned char *cdb) +{ + if (!dev->dev_attrib.pi_prot_type) + return true; + + if (dev->dev_attrib.pi_prot_type == TARGET_DIF_TYPE2_PROT && + (cdb[1] & 0xe0)) + return false; + + if (!(cdb[1] & 0xe0)) { + pr_warn("Target: Unprotected READ/WRITE to DIF device\n"); + return true; + } + if (!cmd->t_prot_sg || !cmd->t_prot_nents) + return true; + + cmd->se_cmd_flags |= SCF_PROT; + return true; +} + sense_reason_t sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) { @@ -581,6 +602,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) cmd->execute_cmd = sbc_execute_rw; break; case READ_10: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_10(cdb); cmd->t_task_lba = transport_lba_32(cdb); cmd->se_cmd_flags |= SCF_SCSI_DATA_CDB; @@ -588,6 +612,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) cmd->execute_cmd = sbc_execute_rw; break; case READ_12: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_12(cdb); cmd->t_task_lba = transport_lba_32(cdb); cmd->se_cmd_flags |= SCF_SCSI_DATA_CDB; @@ -595,6 +622,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) cmd->execute_cmd = sbc_execute_rw; break; case READ_16: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_16(cdb); cmd->t_task_lba = transport_lba_64(cdb); cmd->se_cmd_flags |= SCF_SCSI_DATA_CDB; @@ -610,6 +640,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) break; case WRITE_10: case WRITE_VERIFY: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_10(cdb); cmd->t_task_lba = transport_lba_32(cdb); if (cdb[1] & 0x8) @@ -619,6 +652,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) cmd->execute_cmd = sbc_execute_rw; break; case WRITE_12: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_12(cdb); cmd->t_task_lba = transport_lba_32(cdb); if (cdb[1] & 0x8) @@ -628,6 +664,9 @@ sbc_parse_cdb(struct se_cmd *cmd, struct sbc_ops *ops) cmd->execute_cmd = sbc_execute_rw; break; case WRITE_16: + if (!sbc_check_prot(dev, cmd, cdb)) + return TCM_UNSUPPORTED_SCSI_OPCODE; + sectors = transport_get_sectors_16(cdb); cmd->t_task_lba = transport_lba_64(cdb); if (cdb[1] & 0x8) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Terrible performance of sequential O_DIRECT 4k writes in SAN environment. ~3 times slower then Solars 10 with the same HBA/Storage.
On Wed 08-01-14 19:30:38, Sergey Meirovich wrote: > On 8 January 2014 17:26, Christoph Hellwig wrote: > > > > On my laptop SSD I get the following results (sometimes up to 200MB/s, > > sometimes down to 100MB/s, always in the 40k to 50k IOps range): > > > > time elapsed (sec.):5 > > bandwidth (MiB/s): 160.00 > > IOps: 40960.00 > > Any direct attached storage I've tried was faster for me as well, > indeed. I have already posted IIRC > "06:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS > 2208 [Thunderbolt] (rev 05)" - 1Gb BBU RAM > sysbench seqwr aio 4k: 326.24Mb/sec 20879.56 Requests/sec > > That is good that you mentioned SSD. I've tried fnic HBA zoned to EMC > XtremIO (SSD only based storage) > 14.43Mb/sec 3693.65 Requests/sec for sequential 4k. You see big degradation only in SAN environments because they have generally higher latency to complete a single request. And given appending direct IO is completely synchronous, the latency is the only thing that really matters for performance. I've also seen my desktop-grade SATA drive perform better than some enterprise grade SAN for this particular workload... > So far I've seen so massive degradation only in SAN environment. I > started my investigation with RHEL6.5 kernel so below table is from it > but the trend is the same as for mainline it seems. > > Chunk size Bandwidth MiB/s > > 64M512 > 32M510 > 16M492 > 8M 451 > 4M 436 > 2M 350 > 1M 256 > 512K 191 > 256K 165 > 128K 142 > 64K 101 > 32K 65 > 16K 39 > 8K 20 > 4K 11 Yes, that's expected. The latency to complete a request consists of some fixed overhead + time to write data. So for small request sizes the latency is constant (corresponding to bandwidth growing linearly with the request size) and for larger request sizes latency somewhat grows so bandwidth grows slower and slower (as the time to write the data forms larger and larger part of the total latency)... Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I have a very lucrative Business
ATTN: Dear, I am Dr. Musa Hamed Executive Director of the Bank of Africa Plc B.O.A,Cotonou Benin Rep. I have a very lucrative Business proposal worth of 21,500,000.00 USD. An Iraqi named Hamadi Hashem a business man made a fixed deposit of 21,500,000.00 USD . Upon maturity several notices was sent to him, even during the war,eleven years ago (2003) up till this date. Again after the war another notification was sent and still no response came from him. it has been found out that Hamadi Hashem and his family had been killed during the war in bomb blast that hit their home at Mukaradeeb where his personal oil well was. I am prepared to split the funds with you 50%:50%.By placing you as the NEXT OF KIN to his deposit. Should you be interested in doing this deal with me, please send me the following information: (1) Your Name (2) Resident Address (3) Your Occupation (4) Your Phone Number (5) Date of Birth (6) Resident Country And I will prefer you to reach me privately and securely on my personal email address: dr.musahame...@yahoo.fr Tel: +229-988-400-97 I will provide you with further information as soon as I hear positively from you. Regards, Dr. Musa Hamed -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
On Wed, 8 Jan 2014, Phillip Susi wrote: > > In essence, you want your disk's state to move directly from > > unpowered (system asleep) to runtime-suspended with no intermediate > > active -- but only if the disk doesn't spin up all by itself. > > > > One problem: How can the kernel tell whether the disk has spun up > > by itself? I don't think it can be done at the SCSI level. Can it > > be done at the ATA level (and without using the SCSI stack, as that > > would require the disk to go out of runtime suspend and spin up > > anyway)? > > You issue a REQUEST SENSE command and that returns status indicating > whether the drive is stopped, or in standby. See my patches. One of I never saw your patches. Where were they posted? If you issue the REQUEST SENSE command in the usual way (going through the SCSI and block layers), and the disk is already in runtime suspend, it won't do what you want. The block layer won't deliver requests until the device leaves the RPM_SUSPENDED state. In addition, when it receives the command the block layer will submit a deferred runtime-resume request, which rather defeats the whole purpose. (I guess you actually saw some of this happen, and that's what led to this email thread in the first place.) It's a knotty situation. The only way to find out whether the disk is spinning is to ask it, which requires doing I/O, which requires spinning up the disk. Maybe we need to add a mechanism to the block layer's runtime PM implementation for addressing just this situation. > them fixes the scsi-ata translation to issue the ATA CHECK power > command to see if the drive is suspended, as SAT-3 calls for. What happens with non-ATA disks? > > Actually, it does have a reason. Namely, you told it earlier to > > assume the disk will be needed again if it was used within the last > > 5 minutes. At the time of the system resume, the disk was last > > used 3 minutes ago (not counting the time the system was asleep). > > You're looking at it backwards. You aren't telling it to assume the > disk will be needed again if it was used within the last 5 minutes; > you are telling it that it can assume it *won't* be needed again soon > if not used in the last 5 minutes, and therefore, it is ok to shut it > down. Okay, yes, the autosuspend timeout affects when devices should be powered down, not when they should be powered back up. > Also the act of suspending the system is a pretty clear breaking point > in general activity. You normally stop your work and quiesce the > system before you suspend it. If I finish my work and copy it to my > backup drive, then suspend the system, and my wife comes by an hour > later to browse the web, she's not going to be touching my backup disk. While that's true, it's also true that any userspace processes running before the system suspend will continue running, with no perceptible break, after the system wakes up. If they were accessing the disk before, they will most likely continue to access it afterward. But never mind all this; it's a pointless discussion. The real question is how to implement what you want. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 1/8/2014 4:21 PM, Alan Stern wrote: > I never saw your patches. Where were they posted? Higher up in this thread on linux-ide and linux-scsi. Subject was Let sleeping disks lie. > If you issue the REQUEST SENSE command in the usual way (going > through the SCSI and block layers), and the disk is already in > runtime suspend, it won't do what you want. The block layer won't > deliver requests until the device leaves the RPM_SUSPENDED state. > In addition, when it receives the command the block layer will > submit a deferred runtime-resume request, which rather defeats the > whole purpose. Right, which is why I left the device in the active state and used the pre/post_suspend functions to change the queue into the RPM_SUSPENDING state, then either complete the transition to RPM_SUSPENDED, or bail out and go back to RPM_ACTIVE. > What happens with non-ATA disks? Same thing: they return a sense status that either says they are suspended, stopped, or normal. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSzcg2AAoJEI5FoCIzSKrw7wcIAJLiliMYnoHLm/dGWHhhfcnb ELJLLAhhWsOIQmHiV1svdy0F7EuKi3KF9qJ9JwCaAspO0w8q24UGhvQW20uSH9Md JliKhn6eBY/NFTctaVP2tnOcc4vevCndJfEScBlWxI82MWMC1TV3lA7xPtcJ5ocX LMBLPXpr0qypisgmu/tAnwTPQVyU4WUkgMmG9us4w3BgCkvI/oXf4oDKFkCNgUye FX16NTZPR6iaIK+YJMG3uKCSD4CQjTnKJNXGR89XKJG+Z9v04jyQgWs7kgvqCov9 F5j+yueyMVlveGPHmNwhElHEPT2UWSoyQbonuOrYQZ+Db+xJBOxzoY8Jgq9lWVI= =+vwf -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] dual scan thread bug fix
On Wed, 2014-01-08 at 10:57 -0500, Alan Stern wrote: > On Wed, 8 Jan 2014, James Bottomley wrote: > > > OK, Agreed, but that means modifying the 1/2 patch with the below. This > > should make the proposed diff to 2/2 correct. > > > > James > > > > --- > > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > > index ef3f958..5fad646 100644 > > --- a/drivers/scsi/scsi_scan.c > > +++ b/drivers/scsi/scsi_scan.c > > @@ -417,7 +417,7 @@ static struct scsi_target *scsi_alloc_target(struct > > device *parent, > > + shost->transportt->target_size; > > struct scsi_target *starget; > > struct scsi_target *found_target; > > - int error; > > + int error, ref_got; > > > > starget = kzalloc(size, GFP_KERNEL); > > if (!starget) { > > @@ -466,15 +466,15 @@ static struct scsi_target *scsi_alloc_target(struct > > device *parent, > > return starget; > > > > found: > > - if (!kref_get_unless_zero(&found_target->reap_ref)) > > - /* > > -* release routine already fired. Target is dead, but > > -* STARGET_DEL may not yet be set (set in the release > > -* routine), so set here as well, just in case > > -*/ > > - found_target->state = STARGET_DEL; > > + /* > > +* release routine already fired if kref is zero, so if we can still > > +* take the reference, the target must be alive. If we can't, it must > > +* be dying and we need to wait for a new target > > +*/ > > + ref_got = kref_get_unless_zero(&found_target->reap_ref); > > + > > spin_unlock_irqrestore(shost->host_lock, flags); > > - if (found_target->state != STARGET_DEL) { > > + if (ref_got) { > > put_device(dev); > > return found_target; > > } > > @@ -482,8 +482,8 @@ static struct scsi_target *scsi_alloc_target(struct > > device *parent, > > * Unfortunately, we found a dying target; need to wait until it's > > * dead before we can get a new one. There is an anomaly here. We > > * *should* call scsi_target_reap() to balance the kref_get() of the > > -* reap_ref above. However, since the target is in state STARGET_DEL, > > -* it's already invisible and the reap_ref is irrelevant. If we call > > +* reap_ref above. However, since the target being released, it's > > +* already invisible and the reap_ref is irrelevant. If we call > > * scsi_target_reap() we might spuriously do another device_del() on > > * an already invisible target. > > */ > > In fact, most of this comment (everything after the first sentence) is > no longer needed. If the target is dying then kref_get_unless_zero > must have failed, so there is no need to worry about unbalanced > refcounts. I'd like to keep the comment: I get a lot of email from people who write static checkers for this. In principle, I agree, they should treat kref_get_unless_zero() as a spin_trylock(), but it's nice to have something concrete in the code to point to when the email arrives. > Otherwise this looks good. Great, thanks, I'll re-roll and repost. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers: target: target_core_mod: use div64_u64_rem() instead of operator '%' for u64
On Wed, 2014-01-08 at 08:32 +0100, Hannes Reinecke wrote: > On 12/24/2013 04:35 AM, Chen Gang wrote: > > On 12/23/2013 02:51 PM, Nicholas A. Bellinger wrote: > >> On Sun, 2013-12-22 at 17:17 +0800, Chen Gang wrote: > >>> The related fix patch changed "start_lba = lba % ..." to "start_lba = > >>> lba / ...", and also assumed "segment_size * segment_mult" is still > >>> within u32 (can not cause type over flow). > >>> > >>> I guess the original author already knew about them, and intended to do > >>> like that (if not, please let me know, thanks). > >>> > >> > >> Sorry, your correct that the original code is using modulo division to > >> calculate start_lba. > >> > > > > Oh, that's all right, (in fact, don't need sorry), I am not quite > > familiar with the details, so need related member help check it. :-) > > > >> Hannes, can you please double check this below..? > >> > > > > Please help check when have time, thanks. > > > I would even convert segment_size and segment_mult to u64, > to ensure no overflows occur: > > diff --git a/drivers/target/target_core_alua.c > b/drivers/target/target_core_alua > .c > index 9b1856d..54b1e52 100644 > --- a/drivers/target/target_core_alua.c > +++ b/drivers/target/target_core_alua.c > @@ -477,8 +477,7 @@ static inline int core_alua_state_lba_dependent( > u8 *alua_ascq) > { > struct se_device *dev = cmd->se_dev; > - u32 segment_size, segment_mult, sectors; > - u64 lba; > + u64 segment_size, segment_mult, sectors, lba; > > /* Only need to check for cdb actually containing LBAs */ > if (!(cmd->se_cmd_flags & SCF_SCSI_DATA_CDB)) > > Will squash the above into the original patch shortly in for-next.. > Other than that the sector_div() patch is correct. > Thanks for confirming that sector_div() is correct here vs. the original code using modulo that Chen had pointed out. Thanks Hannes! --nab -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3.12 033/118] usb: xhci: Link TRB must not occur within a USB payload burst
On 01/03/2014 03:29 PM, Sarah Sharp wrote: > I'll let you know when I have some diagnostic patches ready. Hi Sarah. I see today gregkh committed the patches you've already sent me, so I assume someone (other than me) has tested those patches and discovered some benefit from them? I'm still wondering if I'm suffering from hardware quirks. From the first day I installed my usb3 adapter card and the usb3 disk docking station I've noticed some quirky behavior. e.g. I boot the machine with the docking station powered-off, and then later I power it on, the usb3 disk is not detected at all -- until I reboot the machine with the docking station still powered on. Minor stuff, yes, but maybe relevant? I dunno. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: REQ_PM vs REQ_TYPE_PM_RESUME
On 01/09/2014 05:21 AM, Alan Stern wrote: > On Wed, 8 Jan 2014, Phillip Susi wrote: >> You issue a REQUEST SENSE command and that returns status indicating >> whether the drive is stopped, or in standby. See my patches. One of > > I never saw your patches. Where were they posted? > > If you issue the REQUEST SENSE command in the usual way (going through > the SCSI and block layers), and the disk is already in runtime suspend, > it won't do what you want. The block layer won't deliver requests > until the device leaves the RPM_SUSPENDED state. In addition, when it > receives the command the block layer will submit a deferred > runtime-resume request, which rather defeats the whole purpose. > > (I guess you actually saw some of this happen, and that's what led to > this email thread in the first place.) > > It's a knotty situation. The only way to find out whether the disk is > spinning is to ask it, which requires doing I/O, which requires > spinning up the disk. Maybe we need to add a mechanism to the block > layer's runtime PM implementation for addressing just this situation. I think it's knotty because the runtime PM status is a view from the kernel/host side, i.e. it is runtime suspended if it is not being used, no matter which power state it is in. The trigger for the PM state transition are all based on this, if we want to do it the other way around(update device's runtime PM status based on device's actual power state), we are in a knotty situation. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iscsi: fix wrong order of opcode and itt in iscsi_handle_reject prompt
This patch makes reject messages show right value for opcode and itt, which is converse previously. Signed-off-by: Vaughan Cao --- drivers/scsi/libiscsi.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index e399561..27547ff 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1013,13 +1013,13 @@ static int iscsi_handle_reject(struct iscsi_conn *conn, struct iscsi_hdr *hdr, iscsi_conn_printk(KERN_ERR, conn, "pdu (op 0x%x itt 0x%x) rejected " "due to DataDigest error.\n", - rejected_pdu.itt, opcode); + opcode, rejected_pdu.itt); break; case ISCSI_REASON_IMM_CMD_REJECT: iscsi_conn_printk(KERN_ERR, conn, "pdu (op 0x%x itt 0x%x) rejected. Too many " "immediate commands.\n", - rejected_pdu.itt, opcode); + opcode, rejected_pdu.itt); /* * We only send one TMF at a time so if the target could not * handle it, then it should get fixed (RFC mandates that @@ -1059,8 +1059,8 @@ static int iscsi_handle_reject(struct iscsi_conn *conn, struct iscsi_hdr *hdr, default: iscsi_conn_printk(KERN_ERR, conn, "pdu (op 0x%x itt 0x%x) rejected. Reason " - "code 0x%x\n", rejected_pdu.itt, - rejected_pdu.opcode, reject->reason); + "code 0x%x\n", rejected_pdu.opcode, + rejected_pdu.itt, reject->reason); break; } return rc; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iscsi: fix wrong order of opcode and itt in iscsi_handle_reject prompt
On 01/08/2014 08:21 PM, Vaughan Cao wrote: > This patch makes reject messages show right value for opcode and itt, which > is converse previously. > > Signed-off-by: Vaughan Cao > --- > drivers/scsi/libiscsi.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index e399561..27547ff 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1013,13 +1013,13 @@ static int iscsi_handle_reject(struct iscsi_conn > *conn, struct iscsi_hdr *hdr, > iscsi_conn_printk(KERN_ERR, conn, > "pdu (op 0x%x itt 0x%x) rejected " > "due to DataDigest error.\n", > - rejected_pdu.itt, opcode); > + opcode, rejected_pdu.itt); > break; > case ISCSI_REASON_IMM_CMD_REJECT: > iscsi_conn_printk(KERN_ERR, conn, > "pdu (op 0x%x itt 0x%x) rejected. Too many " > "immediate commands.\n", > - rejected_pdu.itt, opcode); > + opcode, rejected_pdu.itt); > /* >* We only send one TMF at a time so if the target could not >* handle it, then it should get fixed (RFC mandates that > @@ -1059,8 +1059,8 @@ static int iscsi_handle_reject(struct iscsi_conn *conn, > struct iscsi_hdr *hdr, > default: > iscsi_conn_printk(KERN_ERR, conn, > "pdu (op 0x%x itt 0x%x) rejected. Reason " > - "code 0x%x\n", rejected_pdu.itt, > - rejected_pdu.opcode, reject->reason); > + "code 0x%x\n", rejected_pdu.opcode, > + rejected_pdu.itt, reject->reason); > break; > } > return rc; > Thanks. Acked-by: Mike Christie -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html