RE: [PATCH 2/6] be2iscsi: relinquishing control after processing 512 CQE

2014-04-02 Thread Jayamohan Kallickal


-Original Message-
From: Mike Christie [mailto:micha...@cs.wisc.edu] 
Sent: Sunday, March 30, 2014 10:25 PM
To: Jayamohan Kallickal
Cc: jbottom...@parallels.com; linux-scsi@vger.kernel.org; Jayamohan Kallickal; 
Minh Duc Tran; Sony John-N
Subject: Re: [PATCH 2/6] be2iscsi: relinquishing control after processing 512 
CQE

On 03/27/2014 11:39 AM, Jayamohan Kallickal wrote:
> @@ -2323,14 +2319,33 @@ void beiscsi_process_all_cqs(struct 
> work_struct *work)
>  
>  static int be_iopoll(struct blk_iopoll *iop, int budget)  {
> - unsigned int ret;
> + unsigned int ret, num_eq_processed;
>   struct beiscsi_hba *phba;
>   struct be_eq_obj *pbe_eq;
> + struct be_eq_entry *eqe = NULL;
> + struct be_queue_info *eq;
>  
> + num_eq_processed = 0;
>   pbe_eq = container_of(iop, struct be_eq_obj, iopoll);
> + phba = pbe_eq->phba;
> + eq = &pbe_eq->q;
> + eqe = queue_tail_node(eq);
> +
> + hwi_ring_eq_db(phba, eq->id, 1, num_eq_processed, 0, 1);

>Is this right? num_eq_processed will be 0 above. Should this be moved down 
>below to after num_eq_processed has been incremented?

I am unarming the interrupts so that we are not interrupted while in be_iopoll. 
So , this is fine

> +
> + while (eqe->dw[offsetof(struct amap_eq_entry, valid) / 32]
> + & EQE_VALID_MASK) {
> +
> + AMAP_SET_BITS(struct amap_eq_entry, valid, eqe, 0);
> + queue_tail_inc(eq);
> + eqe = queue_tail_node(eq);
> + num_eq_processed++;
> + }

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 5/6] be2iscsi: Fix TCP parameters while connection offloading.

2014-04-02 Thread Jayamohan Kallickal


-Original Message-
From: Mike Christie [mailto:micha...@cs.wisc.edu] 
Sent: Sunday, March 30, 2014 9:54 PM
To: Jayamohan Kallickal
Cc: jbottom...@parallels.com; linux-scsi@vger.kernel.org; Jayamohan Kallickal; 
Minh Duc Tran; Sony John-N
Subject: Re: [PATCH 5/6] be2iscsi: Fix TCP parameters while connection 
offloading.

On 03/27/2014 11:39 AM, Jayamohan Kallickal wrote:
> +int mgmt_open_connection_v1(struct beiscsi_hba *phba,
> +  struct sockaddr *dst_addr,
> +  struct beiscsi_endpoint *beiscsi_ep,
> +  struct be_dma_mem *nonemb_cmd) {
> + struct hwi_controller *phwi_ctrlr;
> + struct hwi_context_memory *phwi_context;
> + struct sockaddr_in *daddr_in = (struct sockaddr_in *)dst_addr;
> + struct sockaddr_in6 *daddr_in6 = (struct sockaddr_in6 *)dst_addr;
> + struct be_ctrl_info *ctrl = &phba->ctrl;
> + struct be_mcc_wrb *wrb;
> + struct tcp_connect_and_offload_in_v1 *req;
> + unsigned short def_hdr_id;
> + unsigned short def_data_id;
> + struct phys_addr template_address = { 0, 0 };
> + struct phys_addr *ptemplate_address;
> + unsigned int tag = 0;
> + unsigned int i, ulp_num;
> + unsigned short cid = beiscsi_ep->ep_cid;
> + struct be_sge *sge;
> +
> + phwi_ctrlr = phba->phwi_ctrlr;
> + phwi_context = phwi_ctrlr->phwi_ctxt;
> +
> + ulp_num = phwi_ctrlr->wrb_context[BE_GET_CRI_FROM_CID(cid)].ulp_num;
> +
> + def_hdr_id = (unsigned short)HWI_GET_DEF_HDRQ_ID(phba, ulp_num);
> + def_data_id = (unsigned short)HWI_GET_DEF_BUFQ_ID(phba, ulp_num);
> +
> + ptemplate_address = &template_address;
> + ISCSI_GET_PDU_TEMPLATE_ADDRESS(phba, ptemplate_address);
> + spin_lock(&ctrl->mbox_lock);
> + tag = alloc_mcc_tag(phba);
> + if (!tag) {
> + spin_unlock(&ctrl->mbox_lock);
> + return tag;
> + }
> + wrb = wrb_from_mccq(phba);
> + sge = nonembedded_sgl(wrb);
> +
> + req = nonemb_cmd->va;
> + memset(req, 0, sizeof(*req));
> + wrb->tag0 |= tag;
> +
> + be_wrb_hdr_prepare(wrb, sizeof(*req), false, 1);
> + be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_ISCSI,
> +OPCODE_COMMON_ISCSI_TCP_CONNECT_AND_OFFLOAD,
> +sizeof(*req));
> + if (dst_addr->sa_family == PF_INET) {
> + __be32 s_addr = daddr_in->sin_addr.s_addr;
> + req->ip_address.ip_type = BE2_IPV4;
> + req->ip_address.addr[0] = s_addr & 0x00ff;
> + req->ip_address.addr[1] = (s_addr & 0xff00) >> 8;
> + req->ip_address.addr[2] = (s_addr & 0x00ff) >> 16;
> + req->ip_address.addr[3] = (s_addr & 0xff00) >> 24;
> + req->tcp_port = ntohs(daddr_in->sin_port);
> + beiscsi_ep->dst_addr = daddr_in->sin_addr.s_addr;
> + beiscsi_ep->dst_tcpport = ntohs(daddr_in->sin_port);
> + beiscsi_ep->ip_type = BE2_IPV4;
> + } else if (dst_addr->sa_family == PF_INET6) {
> + req->ip_address.ip_type = BE2_IPV6;
> + memcpy(&req->ip_address.addr,
> +&daddr_in6->sin6_addr.in6_u.u6_addr8, 16);
> + req->tcp_port = ntohs(daddr_in6->sin6_port);
> + beiscsi_ep->dst_tcpport = ntohs(daddr_in6->sin6_port);
> + memcpy(&beiscsi_ep->dst6_addr,
> +&daddr_in6->sin6_addr.in6_u.u6_addr8, 16);
> + beiscsi_ep->ip_type = BE2_IPV6;
> + } else{
> + beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
> + "BG_%d : unknown addr family %d\n",
> + dst_addr->sa_family);
> + spin_unlock(&ctrl->mbox_lock);
> + free_mcc_tag(&phba->ctrl, tag);
> + return -EINVAL;
> +
> + }
> + req->cid = cid;
> + i = phba->nxt_cqid++;
> + if (phba->nxt_cqid == phba->num_cpus)
> + phba->nxt_cqid = 0;
> + req->cq_id = phwi_context->be_cq[i].id;
> + beiscsi_log(phba, KERN_INFO, BEISCSI_LOG_CONFIG,
> + "BG_%d : i=%d cq_id=%d\n", i, req->cq_id);
> + req->defq_id = def_hdr_id;
> + req->hdr_ring_id = def_hdr_id;
> + req->data_ring_id = def_data_id;
> + req->do_offload = 1;
> + req->dataout_template_pa.lo = ptemplate_address->lo;
> + req->dataout_template_pa.hi = ptemplate_address->hi;
> + sge->pa_hi = cpu_to_le32(upper_32_bits(nonemb_cmd->dma));
> + sge->pa_lo = cpu_to_le32(nonemb_cmd->dma & 0x);
> + sge->len = cpu_to_le32(nonemb_cmd->size);
> +
> + req->hdr.version = 1;
> + req->tcp_window_size = 0;
> + req->tcp_window_scale_count = 2;

>Is this approx 100 lines exactly the same as mgmt_open_connection except these 
>3 lines? Maybe a lib/helper function that does the common 100 lines then >some 
>v1 function over that which does the extra 3 lines.

Agreed. Will redo this

Thanks
Jay
--
--
To unsubscribe from this list: send the l

RE: [PATCH 3/6] be2iscsi: Fix MCCQ posting for Mbx-Cmd after driver initialization is complete

2014-04-02 Thread Jayamohan Kallickal


-Original Message-
From: Mike Christie [mailto:micha...@cs.wisc.edu] 
Sent: Thursday, March 27, 2014 5:46 PM
To: Jayamohan Kallickal; jbottom...@parallels.com; linux-scsi@vger.kernel.org
Cc: Jayamohan Kallickal; Sony John-N
Subject: Re: [PATCH 3/6] be2iscsi: Fix MCCQ posting for Mbx-Cmd after driver 
initialization is complete

On 03/27/2014 09:39 AM, Jayamohan Kallickal wrote:
>  Even before probe for function was completed, iSCSI Daemon had 
> initiated login  to target while OS was coming up. The targets which 
> had node.startup=automatic,  login process was initiated.Since 
> function specific initialization was still in  progress this lead to kernel 
> panic.
>  The reson was MCCQ was not yet created and login Mbx_Cmd was sent on MCCQ.
>  wrb_from_mccq() failed with kernel-panic access.
> 
>  In the fix checking in alloc_mcc_tag() if driver initialization is complete 
> or not.
>  Based on the result driver continues or return with an Error.
> 
> Signed-off-by: John Soni Jose 
> Signed-off-by: Jayamohan Kallickal 
> ---
>  drivers/scsi/be2iscsi/be_cmds.c | 8   
> drivers/scsi/be2iscsi/be_main.c | 2 ++  
> drivers/scsi/be2iscsi/be_main.h | 1 +
>  3 files changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/be2iscsi/be_cmds.c 
> b/drivers/scsi/be2iscsi/be_cmds.c index 1432ed5..4cabf5f 100644
> --- a/drivers/scsi/be2iscsi/be_cmds.c
> +++ b/drivers/scsi/be2iscsi/be_cmds.c
> @@ -118,6 +118,14 @@ unsigned int alloc_mcc_tag(struct beiscsi_hba 
> *phba)  {
>   unsigned int tag = 0;
>  
> +
> + if (phba->state & BE_ADAPTER_INIT_PHASE) {
> + beiscsi_log(phba, KERN_ERR,
> + BEISCSI_LOG_INIT | BEISCSI_LOG_CONFIG,
> + "BC_%d : Driver Initialization still in 
> progress\n");
> + return tag;
> + }
> +

>I think normally we do not call scsi_host_add and expose the host until the 
>driver can handle requests from entry points like sysfs and queuecommand.

>Can you just move iscsi_host_add from beiscsi_hba_alloc to beiscsi_dev_probe 
>after beiscsi_setup_boot_info is called?

Tried moving iscsi_add_host and it did not help the issue where the ep_connect 
was being called before the probe is completed.

Chris Leech has a solution for this and when applied along with this patch 
works for us.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 02/55] scsi: Mark function as static in be2iscsi/be_cmds.c

2014-04-02 Thread Jayamohan Kallickal
Looks good

Acked-by: Jayamohan Kallickal 
---

-Original Message-
From: Rashika Kheria [mailto:rashika.khe...@gmail.com] 
Sent: Saturday, March 29, 2014 10:44 AM
To: linux-ker...@vger.kernel.org
Cc: Jayamohan Kallickal; James E.J. Bottomley; linux-scsi@vger.kernel.org; 
j...@joshtriplett.org
Subject: [PATCH 02/55] scsi: Mark function as static in be2iscsi/be_cmds.c

Mark function as static in be2iscsi/be_cmds.c because it is not used outside 
this file.

This eliminates the following warning in be2iscsi/be_cmds.c:
drivers/scsi/be2iscsi/be_cmds.c:401:5: warning: no previous prototype for 
‘beiscsi_process_mcc’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria 
Reviewed-by: Josh Triplett 
---
 drivers/scsi/be2iscsi/be_cmds.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c 
index 3338391..026c3e3 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -398,7 +398,7 @@ static void beiscsi_cq_notify(struct beiscsi_hba *phba, u16 
qid, bool arm,  }
 
 
-int beiscsi_process_mcc(struct beiscsi_hba *phba)
+static int beiscsi_process_mcc(struct beiscsi_hba *phba)
 {
struct be_mcc_compl *compl;
int num = 0, status = 0;
--
1.7.9.5



[Announce] sg3_utils-1.38 available

2014-04-02 Thread Douglas Gilbert

sg3_utils is a package of command line utilities for sending
SCSI and some ATA commands to devices. This package targets
the Linux 3, 2.6 and 2.4 kernel series. It also has ports to
FreeBSD, Tru64, Solaris, and Windows (cygwin and MinGW).

Users of sg_xcopy note that the default recipient of the
xcopy(lid1) command has changed from the source to the
destination device. An environment variable [XCOPY_TO_SRC] can
re-instate the previous default. This version tracks various
changes made by www.t10.org since October 2013.

For an overview of sg3_utils and downloads see this page:
http://sg.danny.cz/sg/sg3_utils.html
The sg_ses utility (for enclosure devices) is discussed at:
http://sg.danny.cz/sg/sg_ses.html
A full changelog can be found at:
http://sg.danny.cz/sg/p/sg3_utils.ChangeLog

A release announcement will be sent to freecode.com .

Changelog for sg3_utils-1.38 [20140401] [svn: r563]
  - sg_ses: add --dev-slot-num= and --sas-addr=
- fix --data=- problem with large buffers
- new --data=@FN to read hex data from file FN
- error and warning message cleanup
- add --maxlen= option
  - sg_inq: add --block=0|1 option to control opens
- add --inhex=FN to read response in ASCII hex from
  a file; --inhex=FN --raw reads response in binary
- make -HHH (- for '-p ai') output suitable for
  another sg_inq invocation to use --inhex to decode
- add LU_CONG to standard inquiry response output
- decode ASCII information VPD pages
- add HAW_ZBC in block dev char. VPD page (sbc4r01)
- sync version descriptors dated 20131126
- allow --page=-1 to force std INQUIRY decoding
- fix overflow in encode_whitespaces
- improve unit serial number display (VPD page 0x80)
  - sg_vpd: add LU_CONG to standard inquiry response output
- add --inhex=FN to read response in ASCII hex from
  a file; --inhex=FN --raw reads response in binary
- decode Third Party Copy (tpc) page
- add HAW_ZBC in block dev char. VPD page (sbc4r01)
- add LTO and DDS vendor pages
- allow --page=num to restrict --enumerate output
  - sg_persist: add PROUT: Replace Lost Reservation (spc4r36)
- add --transport-id= for SOP: 'sop,'
  - sg_readcap: for --16 show physical block size if
different from logical block size
  - sg_xcopy: environment variables: XCOPY_TO_SRC and
XCOPY_TO_DST indicate where xcopy command is sent
- change default to send xcopy to dst (was src)
- improve CL handling of short options (e.g. '-vv')
  - sg_luns: guard against garbage response
  - sg_decode_sense: with --nospace ignore spaces on
command line, so multiple arguments are concatenated
  - sg_write_same: repeat if unit attention
  - sg_rtpg: fix indexing bug with --extended option
  - sg_logs: placeholder for pending defects lpage
  - sg_unmap: fix another problem with --grpnum= option
  - sg_lib.h: add PDT_ZBC define (spc4r36p)
  - sg_lib_data: sync asc/ascq codes with T10 dated 20140320
- add pdt string for ZBC (spc4r36p)
  - sg_lib: extensions to sg_get_num() and sg_get_llnum()
  - sg_cmds_extra: fix sa bug in sg_ll_3party_copy_out()
  - scripts/rescan-scsi-bus.sh: check if FC driver exports
issue_lip before using it
- man page added (Linux only)
  - scripts/59-scsi-sg3_utils.rules: linux specific udev rules
  - examples: add sg_tst_excl3 for testing O_EXCL
- improve sg_tst_excl and sg_tst_excl2
- add sg_tst_context for testing file handle contexts
  - upgrade automake to version 1.13.3
  - add suse directory and 'spec' file to facilitate builds

Changelog for sg3_utils-1.37 [20131014] [svn: r522]
...

Doug Gilbert
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] blk-mq: support for shared tags

2014-04-02 Thread Matias Bjorling
On 04/02/2014 12:46 AM, Christoph Hellwig wrote:
> On Tue, Apr 01, 2014 at 05:16:21PM -0700, Matias Bjorling wrote:
>> Hi Christoph,
>>
>> Can you rebase it on top of 3.14. I have trouble applying it for testing.
> 
> Hi Martin,
> 
> the series is based on top of Jens' for-next branch.  I've also pushed out a
> git tree to the blk-mq-share-tags.2 branch of
> 
>   git://git.infradead.org/users/hch/scsi.git
> 
> to make testing and reviewing easier.
>

Thanks.

Regarding the tags API. I think the best approach is a struct
blk_mq_tags_reg. That'll make their parameters very visible in the
drivers. I'll send a patch with the change, using the nvme driver as an
example.

>> For nvme, there's need for two separate types of queues. The admin queue
>> (before initializing blk-mq) and the actual hardware queues.
>>
>> Should we allow the driver to get/put tags before initializing blk-mq?
>> Or let drivers implement their own framework?
> 
> What do you mean with initializing blk-mq?  We need to allocate data
> structures for sure, and I don't see much else in terms of initialization
> in blk-mq.
> 

For the nvme driver, there's a single admin queue, which is outside
blk-mq's control, and the X normal queues. Should we allow the shared
tags structure to be used (get/put) for the admin queue, without
initializing blk-mq? or should the drivers simply implement their own
tags for their admin queue?



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New release of SCST/Qlogic target interface driver.

2014-04-02 Thread Dr. Greg Wettstein
Good evening, I hope the week is going well for everyone.

The weather service started mumbling last night about eight more
inches of snow getting dumped on West-Central Minnesota so I figured
if I was going to get a last weekend of cross-country skiing in I had
better finish the week down at the lake before driving on the
interstate got bad.

So I loaded the pickup and pushed Izzy our Golden Retriever into the
cab and headed out.  On the way down he convinced me that since most
of the weekend would be spent working on the chicken cruncher code we
should get a new release of the SCST/Qlogic interface driver out now
that the kinks seemed to be worked out of the Selective Retransmission
Request (SRR) handling.

So any Qlogic HBA customers or storage OEM's who are interested in
running the SCST storage stack on the latest generation Qlogic HBA's
and CNA's can pick up version 1.1 of the SCST Qlogic Adapter (SQA)
interface driver at the following URL:

ftp://ftp.enjellic.com/pub/scst/sqa_driver-1.1.tar.gz

The interface driver has been subjected to enough abuse and traffic at
this point that we elected to remove the 'beta' designation with this
release.  As with any storage product, ** ALWAYS **, evaluate the
product carefully for suitability in the intended environment.

This interface driver was developed for and is in production use in a
large Cisco/Nexus fibre-channel/FCOE environment.  It has been tested
with the Qlogic 8362 CNA and 2672 HBA running in CNA personality mode
as well as the 24xx and 25xx families of fibre-channel adapters.
While not tested I would not anticipate any issues with 2672 HBA's
running in 16 GBPS FC mode.

This driver provides the interface 'glue' which allows SCST to use the
in-kernel qla2xxx core and target driver code.  It plays the same role
for SCST that the tcm_qla2xxx code does for the in-kernel LIO target.

Its now starting to snow out and despite the fact that I'm chucking
wood into the stove as fast as I can it is only about 49 degress in
the cabin.  Izzy has commandeered the bed since its the only warm spot
so I'm going to heat up a couple of pieces of pizza and wrap up in a
blanket in front of the stove.

Best wishes to everyone in Linux storage land for a similarily
enjoyable evening/weekend.

Greg and Izzy

As always,
Dr. Greg Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--
"A computer is like a horse, it will sense weakness."
-- Greg Wettstein
   GAUSSIAN quote.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-04-02 Thread Quinn Tran

Regards,
Quinn Tran




On 4/2/14 11:20 AM, "Nicholas A. Bellinger"  wrote:

>On Wed, 2014-04-02 at 09:51 +0300, Sagi Grimberg wrote:
>> On 4/1/2014 8:45 PM, Nicholas A. Bellinger wrote:
>> > On Tue, 2014-04-01 at 20:27 +0300, sagi grimberg wrote:
>> >> On 4/1/2014 8:09 PM, Martin K. Petersen wrote:
>>  "Sagi" == Sagi Grimberg  writes:
>> >>> Sagi> I originally wrote the code to support that. But it got left
>> >>> Sagi> behind since I figured it is not an interesting use-case.  If
>>your
>> >>> Sagi> beckend doesn't support T10-PI why should the target publish
>>it
>> >>> Sagi> support it and ask the device to strip/insert it?  I suppose
>>it is
>> >>> Sagi> to allow the initiator to protect half-way, but I don't know
>>how
>> >>> Sagi> interesting it is if the data is not stored with protection...
>> >>>
>> >>> That depends what you do on the backend. There are several devices
>>out
>> >>> there that expose PI to the host but use a different protection
>>scheme
>> >>> internally. And then synthesize PI on the host-facing side. Some
>>even do
>> >>> T10 PI to an internal protection scheme and then back to T10 PI when
>> >>> talking to the disk drives in the back end.
>> >>>
>> >> Hey Martin,
>> >>
>> >> I understand, but even for internal different T10-PI schemes, is
>> >> stripping protection from incoming data
>> >> at the fabric level (and then do whatever with it in the backend
>>level)
>> >> the right thing to do here?
>> >> I mean we basically lose protection across the PCI with this scheme
>> >> aren't we?
>> >>
>> > The WRITE_STRIP + READ_INSERT case would be still be useful for IBLOCK
>> > backends that don't support real hw PI, so that at least the
>>protection
>> > can be in place for data movement between physical machines.
>> >
>> > Also, I think the amount of changes required to support this type of
>> > configuration in target-core are quite small.
>>
>> So trying to understand how this will come to use.
>> Target will publish the fabric T10-PI support based only on the
>> transport configuration (not accounting the backing devices
>>configuration).
>
>Yes, passing in the transport configuration for PI at
>transport_init_session() time seems to make the most sense here in order
>to address all fabric types.

QT> Ack.  Registering PI cap per IT nexus would fit all fabrics.

>
>> Then upon each cmd the target will look on {backstore configuration,
>> PROTECT bit, transport configuration} - then will decide on protection
>> operation (STRIP/INSERT/PASS).
>>
>> Looks right?
>>
>
>Correct.
>
>I'm thinking it makes sense for target-core to perform the WRITE_INSERT
>+ READ_STRIP (in software) when the transport does not directly support
>PI, but the backend has PI enabled.
>
>--nab

QT> I see, this cover the case of a transport within a Portal does support
PI.

>




This message and any attached documents contain information from QLogic 
Corporation or its wholly-owned subsidiaries that may be confidential. If you 
are not the intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message.
<>

Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 03:53:33PM -0700, Zach Brown wrote:
> > > I'd just remove this generic teardown callback path entirely.  If
> > > there's PI state hanging off the iocb tear it down during iocb teardown.
> > 
> > Hmm, I thought aio_complete /was/ iocb teardown time.
> 
> Well, usually :).  If you build up before aio_run_iocb() then you nead
> to teardown in kiocb_free(), which is also called by aio_complete().

Oh, yeah.  I handle that by tearing down the extensions if stuff fails, though
I don't remember if that was in this version of the patchset.

> > > (Isn't there some allocate-and-copy-from-userspace helper now? But..)
> > 
> >  Is there?  I didn't find one when I looked, but it wasn't an 
> > exhaustive
> > search.
> 
> I could have sworn that I saw something.. ah, right, memdup_user().

Noted. :)

> > > I don't like the rudundancy of the implicit size requirement by a
> > > field's flag being set being duplicated by the explicit size argument.
> > > What does that give us, exactly?
> > 
> > Either another sanity check or another way to screw up, depending on how you
> > look at it.  I'd been considering shortening the size field to u32 and 
> > adding a
> > magic number field, but I wonder if that's really necessary.  Seems like it
> > shouldn't be -- if userland screws up, it's not hard to kill the process.
> > (Or segv it, or...)
> 
> I don't think I'd bother.  The bits should be enough and are already
> necessary to have explicit indicators of fields being set.



> > > Fields in the iocb  As each of these are initialized I'd just
> > > test the presence bits and __get_user() the userspace arguemnts
> > > directly, or copy_from_user() something slightly more complicated on to
> > > the stack.
> > >
> > > That gets rid of us having to care about the size at all.  It stops us
> > > from allocating a kernel copy and pinning it for the duration of the IO.
> > > We'd just be sampling the present userspace arguments as we initialie
> > > the iocb during submission.
> > 
> > I like this idea.  For the PI extension, nothing particularly error-prone
> > happens in teardown, which allows the flexibility to copy_from_user any
> > arguments required, and to copy_to_user any setup errors that happen.  I can
> > get rid a lot of allocate-and-copy nonsense, as you point out.
> > 
> > Ok, I'll migrate my patches towards this strategy, and let's see how much 
> > code
> > goes away. :)
> 
> Cool :).
> 
> > I've also noticed a bug where if you make one of these PI-extended calls on 
> > a
> > file living on a filesystem, it'll extend the io request's range to be
> > filesystem block-aligned, which causes all kinds of havoc with the user
> > provided PI buffers, since they now need to be extended to fit the added
> > blocks.  Alternately, one could require PI IOs to be fs-block aligned when
> > dealing with regular files. 
> 
> I think, like O_DIRECT, it just has to be aligned or fail :(.

Heh.  O_DIRECT is a hilarious maze of twisty unobvious requirements.  Yuck.

#define O_IMNAIVEENOUGHTOTHINKIKNOWWHATTHISDOES O_DIRECT

--D
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] aio/dio: enable PI passthrough

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 03:33:11PM -0700, Zach Brown wrote:
> > One thing I'm not sure about: What's the largest IO (in terms of # of 
> > blocks,
> > not # of struct iovecs) that I can throw at the kernel?
> 
> Yeah, dunno.  I'd guess big :).  I'd hope that the PI code already has a
> way to clamp the size of bios if there's a limit to the size of PI data
> that can be managed downstream?

I guess if we restricted the size of the PI buffer to a page's worth of
pointers to struct page, that limits us to 128M on x64 with DIF and 512b
sectors.  That's not really a whole lot; I suppose one could (ab)use vmalloc.

Yes, blk-integrity clamps the size of the bio to fit the downstream device's
maximum integrity sg size.  See max_integrity_segments for details, or the
mostly-undocumented sg_prot_tablesize sysfs attribute that reveals it.

I don't know what a practical limit is; scsi_debug sets it to 65536.

--D
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Zach Brown
> > I'd just remove this generic teardown callback path entirely.  If
> > there's PI state hanging off the iocb tear it down during iocb teardown.
> 
> Hmm, I thought aio_complete /was/ iocb teardown time.

Well, usually :).  If you build up before aio_run_iocb() then you nead
to teardown in kiocb_free(), which is also called by aio_complete().

> > (Isn't there some allocate-and-copy-from-userspace helper now? But..)
> 
>  Is there?  I didn't find one when I looked, but it wasn't an 
> exhaustive
> search.

I could have sworn that I saw something.. ah, right, memdup_user().

> > I don't like the rudundancy of the implicit size requirement by a
> > field's flag being set being duplicated by the explicit size argument.
> > What does that give us, exactly?
> 
> Either another sanity check or another way to screw up, depending on how you
> look at it.  I'd been considering shortening the size field to u32 and adding 
> a
> magic number field, but I wonder if that's really necessary.  Seems like it
> shouldn't be -- if userland screws up, it's not hard to kill the process.
> (Or segv it, or...)

I don't think I'd bother.  The bits should be enough and are already
necessary to have explicit indicators of fields being set.

> > Fields in the iocb  As each of these are initialized I'd just
> > test the presence bits and __get_user() the userspace arguemnts
> > directly, or copy_from_user() something slightly more complicated on to
> > the stack.
> >
> > That gets rid of us having to care about the size at all.  It stops us
> > from allocating a kernel copy and pinning it for the duration of the IO.
> > We'd just be sampling the present userspace arguments as we initialie
> > the iocb during submission.
> 
> I like this idea.  For the PI extension, nothing particularly error-prone
> happens in teardown, which allows the flexibility to copy_from_user any
> arguments required, and to copy_to_user any setup errors that happen.  I can
> get rid a lot of allocate-and-copy nonsense, as you point out.
> 
> Ok, I'll migrate my patches towards this strategy, and let's see how much code
> goes away. :)

Cool :).

> I've also noticed a bug where if you make one of these PI-extended calls on a
> file living on a filesystem, it'll extend the io request's range to be
> filesystem block-aligned, which causes all kinds of havoc with the user
> provided PI buffers, since they now need to be extended to fit the added
> blocks.  Alternately, one could require PI IOs to be fs-block aligned when
> dealing with regular files. 

I think, like O_DIRECT, it just has to be aligned or fail :(.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] aio/dio: enable PI passthrough

2014-04-02 Thread Zach Brown
> One thing I'm not sure about: What's the largest IO (in terms of # of blocks,
> not # of struct iovecs) that I can throw at the kernel?

Yeah, dunno.  I'd guess big :).  I'd hope that the PI code already has a
way to clamp the size of bios if there's a limit to the size of PI data
that can be managed downstream?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 12:49:47PM -0700, Zach Brown wrote:
> > @@ -916,6 +921,17 @@ void aio_complete(struct kiocb *iocb, long res, long 
> > res2)
> > struct io_event *ev_page, *event;
> > unsigned long   flags;
> > unsigned tail, pos;
> > +   int ret;
> > +
> > +   ret = io_teardown_extensions(iocb);
> > +   if (ret) {
> > +   if (!res)
> > +   res = ret;
> > +   else if (!res2)
> > +   res2 = ret;
> > +   else
> > +   pr_err("error %d tearing down aio extensions\n", ret);
> > +   }
> 
> This ends up trying to copy the kernel's io_extension copy back to
> userspace from interrupts, which obviously won't fly.
> 
> And to what end?  So that maybe someone can later add an 'extension'
> that can fill in some field that's then copied to userspace?  But by
> copying the entire argument struct back?
> 
> Let's not get ahead of ourselves.  If they're going to try and give
> userspace some feedback after IO completion they're going to have to try
> a lot harder because they don't have acces to the submitting task
> context anymore.  They'd have to pin some reference to a feedback
> mechanism in the in-flight io.  I think we'd want that explicit in the
> iocb, not hiding off on the other side of this extension interface.

I think we'd want to find an extension that really needs this.  PI doesn't.
We can skate by without supporting the teardown errors case for now.

> I'd just remove this generic teardown callback path entirely.  If
> there's PI state hanging off the iocb tear it down during iocb teardown.

Hmm, I thought aio_complete /was/ iocb teardown time.

> > +struct io_extension_type {
> > +   unsigned int type;
> > +   unsigned int extension_struct_size;
> > +   int (*setup_fn)(struct kiocb *, int is_write);
> > +   int (*destroy_fn)(struct kiocb *);
> > +};
> 
> I'd also get rid of all of this.  More below.
> 
> > +static int io_setup_extensions(struct kiocb *req, int is_write,
> > +  struct io_extension __user *ioext)
> > +{
> > +   struct io_extension_type *iet;
> > +   __u64 sz, has;
> > +   int ret;
> > +
> > +   /* Check size of buffer */
> > +   if (unlikely(copy_from_user(&sz, &ioext->ie_size, sizeof(sz
> > +   return -EFAULT;
> > +   if (sz > PAGE_SIZE ||
> > +   sz > sizeof(struct io_extension) ||
> > +   sz < IO_EXT_SIZE(ie_has))
> > +   return -EINVAL;
> > +
> > +   /* Check that the buffer's big enough */
> > +   if (unlikely(copy_from_user(&has, &ioext->ie_has, sizeof(has
> > +   return -EFAULT;
> > +   ret = io_check_bufsize(has, sz);
> > +   if (ret)
> > +   return ret;
> > +
> > +   /* Copy from userland */
> > +   req->ki_ioext = kzalloc(sizeof(struct kio_extension), GFP_NOIO);
> > +   if (!req->ki_ioext)
> > +   return -ENOMEM;
> > +
> > +   req->ki_ioext->ke_user = ioext;
> > +   if (unlikely(copy_from_user(&req->ki_ioext->ke_kern, ioext, sz))) {
> > +   ret = -EFAULT;
> > +   goto out;
> > +   }
> 
> (Isn't there some allocate-and-copy-from-userspace helper now? But..)

 Is there?  I didn't find one when I looked, but it wasn't an exhaustive
search.

> I don't like the rudundancy of the implicit size requirement by a
> field's flag being set being duplicated by the explicit size argument.
> What does that give us, exactly?

Either another sanity check or another way to screw up, depending on how you
look at it.  I'd been considering shortening the size field to u32 and adding a
magic number field, but I wonder if that's really necessary.  Seems like it
shouldn't be -- if userland screws up, it's not hard to kill the process.
(Or segv it, or...)

> Our notion of the total size only seems to only matter if we're copying
> the entire struct from userspace and I'm don't think we need to do that.
> 
> For each argument, we're translating it into some kernel equivalent,
> right?

Yes.

> Fields in the iocb  As each of these are initialized I'd just
> test the presence bits and __get_user() the userspace arguemnts
> directly, or copy_from_user() something slightly more complicated on to
> the stack.
>
> That gets rid of us having to care about the size at all.  It stops us
> from allocating a kernel copy and pinning it for the duration of the IO.
> We'd just be sampling the present userspace arguments as we initialie
> the iocb during submission.

I like this idea.  For the PI extension, nothing particularly error-prone
happens in teardown, which allows the flexibility to copy_from_user any
arguments required, and to copy_to_user any setup errors that happen.  I can
get rid a lot of allocate-and-copy nonsense, as you point out.

Ok, I'll migrate my patches towards this strategy, and let's see how much code
goes away. :)

I've also noticed a bug where if you make one of these PI-extended calls on a
file living on a filesystem, it'll extend the io request's range to be
filesystem block-aligned, which causes all kinds 

RE: [PATCH 0/7] Performance improvements for LSI SCSI cards

2014-04-02 Thread Krishnamoorthy, Praveen
> CC few more LSI driver developers.

I have reviewed the patches. It looks good.

Sreekanth - Please ACK once you are done.

Reviewed-by: Praveen Krishnamoorthy 

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 03:22:20PM -0400, Jeff Moyer wrote:
> "Darrick J. Wong"  writes:
> 
> > Define a generic interface to allow userspace to attach metadata to an
> > IO operation.  This interface will be used initially to implement
> > protection information (PI) pass through, though it ought to be usable
> > by anyone else desiring to extend the IO interface.  It should not be
> > difficult to modify the non-AIO calls to use this mechanism.
> 
> My main issue with this patch is determining what exactly gets returned
> to userspace when there is an issue in the teardown_extensions path.
> It looks like you'll get the first error propagated from
> io_teardown_extensions, others are ignored.  Then, in aio_complete, if
> there was no error with the I/O, then you'll get the teardown error
> reported in event->res, otherwise you'll get it in event->res2.  So,
> what are the valid errors returned by the teardown routine for
> extensions?  How is the userspace app supposed to determine where the
> error came from, the I/O or a failure in the extension teardown?

There's also the question of which extension spat out the error.  One solution
would be to augment struct io_extension with all the error fields that we want
(an extension can declare its own if needed) as we do now, and if errors happen
during setup, we can just copy_to_user them back.  If nothing else fails with
the IO setup, the setup routine can return -EINVAL, and userspace can look for
updated error fields in the struct.

Unfortunately for the teardown error case you'd have to pin the whole page in
memory for the duration of the IO just to have it around.  For now this isn't a
problem because teardown can't fail anyway.

> I think it may make sense to only use res2 for reporting io extension
> teardown failures.  Any new code that will use extensions can certainly
> be written to check both res and res2, and this method would prevent the
> ambiguity I mentioned.

Hmm, doesn't look like anyone actually uses res2 except for USB gadgets.

It's tempting just to shove the first ioextension error code that comes along
into res2 and abort the whole thing, and let userspace guess where the res2
code came from.  I think there's an additional problem with stuffing return
codes: in the case of synchronous IO syscalls, we'd have to deal with how to
cram error codes from (potentially) multiple sources into the single return
value, while not giving userspace any help as to where the code came from.

Now that I've written all that out, I don't like this idea so I'll drop it. :)

> Finally, I know this is an RFC, but please add some man-page changes to
> your patch set, and CC linux-man.  Michael Kerrisk typically has
> valuable advice on new APIs.

I'll do that the next time I rev the patches.  Thank you for the suggestion.

--D
> 
> Cheers,
> Jeff
> 
> >
> > Signed-off-by: Darrick J. Wong 
> > ---
> >  fs/aio.c |  180 
> > +-
> >  include/linux/aio.h  |7 ++
> >  include/uapi/linux/aio_abi.h |   15 +++-
> >  3 files changed, 197 insertions(+), 5 deletions(-)
> >
> >
> > diff --git a/fs/aio.c b/fs/aio.c
> > index 062a5f6..0c40bdc 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -158,6 +158,11 @@ static struct vfsmount *aio_mnt;
> >  static const struct file_operations aio_ring_fops;
> >  static const struct address_space_operations aio_ctx_aops;
> >  
> > +static int io_teardown_extensions(struct kiocb *req);
> > +static int io_setup_extensions(struct kiocb *req, int is_write,
> > +  struct io_extension __user *ioext);
> > +static int iocb_setup_extensions(struct iocb *iocb, struct kiocb *req);
> > +
> >  static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
> >  {
> > struct qstr this = QSTR_INIT("[aio]", 5);
> > @@ -916,6 +921,17 @@ void aio_complete(struct kiocb *iocb, long res, long 
> > res2)
> > struct io_event *ev_page, *event;
> > unsigned long   flags;
> > unsigned tail, pos;
> > +   int ret;
> > +
> > +   ret = io_teardown_extensions(iocb);
> > +   if (ret) {
> > +   if (!res)
> > +   res = ret;
> > +   else if (!res2)
> > +   res2 = ret;
> > +   else
> > +   pr_err("error %d tearing down aio extensions\n", ret);
> > +   }
> >  
> > /*
> >  * Special case handling for sync iocbs:
> > @@ -1350,15 +1366,167 @@ rw_common:
> > return 0;
> >  }
> >  
> > +/* IO extension code */
> > +#define REQUIRED_STRUCTURE_SIZE(type, member)  \
> > +   (offsetof(type, member) + sizeof(((type *)NULL)->member))
> > +#define IO_EXT_SIZE(member) \
> > +   REQUIRED_STRUCTURE_SIZE(struct io_extension, member)
> > +
> > +struct io_extension_type {
> > +   unsigned int type;
> > +   unsigned int extension_struct_size;
> > +   int (*setup_fn)(struct kiocb *, int is_write);
> > +   int (*destroy_fn)(struct kiocb *);
> > +};
> > +
> > +static struct io_extensi

[PATCH RESEND] scsi_transport_sas: move bsg destructor into sas_rphy_remove

2014-04-02 Thread Joe Lawrence
The recent change in sysfs, bcdde7e221a8750f9b62b6d0bd31b72ea4ad9309
"sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric
rphy device creation/deletion sequence in scsi_transport_sas:

  modprobe mpt2sas
sas_rphy_add
  device_add A   rphy->dev
  device_add B   sas_device transport class
  device_add C   sas_end_device transport class
  device_add D   bsg class

  rmmod mpt2sas
sas_rphy_delete
  sas_rphy_remove
device_del B
device_del C
device_del A
  sysfs_remove_group recursive sysfs dir removal
  sas_rphy_free
device_del D warning

  where device A is the parent of B, C, and D.

When sas_rphy_free tries to unregister the bsg request queue (device D
above), the ensuing sysfs cleanup discovers that its sysfs group has
already been removed and emits a warning, "sysfs group... not found for
kobject 'end_device-X:0'".

Since bsg creation is a side effect of sas_rphy_add, move its
complementary removal call into sas_rphy_remove. This imposes the
following tear-down order for the devices above: D, B, C, A.

Note the sas_device and sas_end_device transport class devices (B and C
above) are created and destroyed both via the list match traversal in
attribute_container_device_trigger, so the order in which they are
handled is fixed. This is fine as long as they are deleted before their
parent device.

Signed-off-by: Joe Lawrence 
Cc: "James E.J. Bottomley" 
Cc: Dan Williams 
Cc: "Reddy, Sreekanth" 
---
 drivers/scsi/scsi_transport_sas.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_sas.c 
b/drivers/scsi/scsi_transport_sas.c
index 1b68142..c341f85 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1621,8 +1621,6 @@ void sas_rphy_free(struct sas_rphy *rphy)
list_del(&rphy->list);
mutex_unlock(&sas_host->lock);
 
-   sas_bsg_remove(shost, rphy);
-
transport_destroy_device(dev);
 
put_device(dev);
@@ -1681,6 +1679,7 @@ sas_rphy_remove(struct sas_rphy *rphy)
}
 
sas_rphy_unlink(rphy);
+   sas_bsg_remove(NULL, rphy);
transport_remove_device(dev);
device_del(dev);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] scsi_transport_sas: move bsg destructor into sas_rphy_remove

2014-04-02 Thread Joe Lawrence
On Wed, 2 Apr 2014 16:40:41 -0400
Joe Lawrence  wrote:

> Since bsg creation is a side effect of sas_rphy_add, move its
> complementary removal call into sas_rphy_remove. 

Hello James,

This a resend of a patch posted to quiet an end-device sysfs warning in
its device deletion path when removing the mpt2sas driver:

  (initial report)
  sysfs group not found for kobject on mpt2sas unload
  http://marc.info/?t=13849746004
  
  (patch + ACK + comments)
  http://marc.info/?t=13860945551

  (gentoo, LSI repro)
  mpt2sas driver barfs when force removing a drive on 3.13.1
  http://marc.info/?t=13912235131

Dan Williams had a few other suggestions for cleanup in this area, but
those could be handled in a subsequent patch.  This one missed 3.14
inclusion and the warning still occurs in Linus's tree as of today
(post scsi/block merge) hence the repost.

Regards,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] aio/dio: enable PI passthrough

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 01:01:33PM -0700, Zach Brown wrote:
> > +static int setup_pi_ext(struct kiocb *req, int is_write)
> > +{
> > +   struct file *file = req->ki_filp;
> > +   struct io_extension *ext = &req->ki_ioext->ke_kern;
> > +   void *p;
> > +   unsigned long start, end;
> > +   int retval;
> > +
> > +   if (!(file->f_flags & O_DIRECT)) {
> > +   pr_debug("EINVAL: can't use PI without O_DIRECT.\n");
> > +   return -EINVAL;
> > +   }
> > +
> > +   BUG_ON(req->ki_ioext->ke_pi_iter.pi_userpages);
> > +
> > +   end = (((unsigned long)ext->ie_pi_buf) + ext->ie_pi_buflen +
> > +   PAGE_SIZE - 1) >> PAGE_SHIFT;
> > +   start = ((unsigned long)ext->ie_pi_buf) >> PAGE_SHIFT;
> > +   req->ki_ioext->ke_pi_iter.pi_offset = offset_in_page(ext->ie_pi_buf);
> > +   req->ki_ioext->ke_pi_iter.pi_len = ext->ie_pi_buflen;
> > +   req->ki_ioext->ke_pi_iter.pi_nrpages = end - start;
> > +   p = kzalloc(req->ki_ioext->ke_pi_iter.pi_nrpages *
> > +   sizeof(struct page *),
> > +   GFP_NOIO);
> 
> Can userspace give us bad data and get us to generate insane allcation
> attempt warnings?

Easily.  One of the bits I have to work on for the PI part is figuring out how
to check with the PI provider that the arguments (the iovec and the pi buffer)
actually make any sense, in terms of length and alignment requirements (PI
tuples can't cross pages).  I think it's as simple as adding a bio_integrity
ops call, and then calling down to it from the kiocb level.

One thing I'm not sure about: What's the largest IO (in terms of # of blocks,
not # of struct iovecs) that I can throw at the kernel?

> > +   if (p == NULL) {
> > +   pr_err("%s: no room for page array?\n", __func__);
> > +   return -ENOMEM;
> > +   }
> > +   req->ki_ioext->ke_pi_iter.pi_userpages = p;
> > +
> > +   retval = get_user_pages_fast((unsigned long)ext->ie_pi_buf,
> > +req->ki_ioext->ke_pi_iter.pi_nrpages,
> > +is_write,
> 
> Isn't this is_write backwards?  If it's a write syscall then the PI
> pages is going to be read from.

Yes, I think so.  Good catch!

--D
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] fs/bio-integrity: remove duplicate code

2014-04-02 Thread Darrick J. Wong
On Wed, Apr 02, 2014 at 12:17:58PM -0700, Zach Brown wrote:
> > +static int bio_integrity_generate_verify(struct bio *bio, int operate)
> >  {
> 
> > +   if (operate)
> > +   sector = bio->bi_iter.bi_sector;
> > +   else
> > +   sector = bio->bi_integrity->bip_iter.bi_sector;
> 
> > +   if (operate) {
> > +   bi->generate_fn(&bix);
> > +   } else {
> > +   ret = bi->verify_fn(&bix);
> > +   if (ret) {
> > +   kunmap_atomic(kaddr);
> > +   return ret;
> > +   }
> > +   }
> 
> I was glad to see this replaced with explicit sector and func arguments
> in later refactoring in the 6/ patch.
> 
> But I don't think the function poiner casts in that 6/ patch are wise
> (Or even safe all the time, given crazy function pointer trampolines?
> Is that still a thing?).  I'd have made a single walk_fn type that
> returns and have the non-returning iterators just return 0.

Noted.  I cleaned all that crap out just yesterday, so now there's only one
walk function and some context data that gets passed to the iterator function.
Much less horrifying.

(I really only included this patch so that I'd have less rebasing work when
3.15-rc1 comes out.)

--D
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] OSS target - VMware SCSI reservation bug conformity.

2014-04-02 Thread Dr. Greg Wettstein
On Mar 28,  8:53pm, Vladislav Bolkhovitin wrote:
} Subject: Re: [Scst-devel] OSS target - VMware SCSI reservation bug conform

> Dr. Greg Wettstein, on 03/27/2014 11:21 AM wrote:
> > Hi, hope the week is going well for everyone.
> >
> > There appears to be evidence that VMware has an issue with exact SCSI
> > standards compliance when it comes to handling corner cases with SCSI
> > reservation requests.  It appears as if Dell is pushing firmware hot
> > fixes for the EqualLogic controllers to work around the issue.

> Hi Greg,

Hi Vlad, thanks for taking the time to respond.

> That's interesting, but, unfortunately, your message doesn't contain
> sufficient technical details to look at this issue, if it exists. Or
> do you think we are magicians who can read minds and see through
> walls? ;)

Actually I did, but I assumed a maintenance contract would be needed
for that.

I had the following reasons for raising the issue:

1.) Does anyone in the open-source storage eco-system,
ie. SCST/LIO whatever, have any confirmation that this
is a known issue.

2.) To alert other open-source storage users/vendors that, at
least from our experience, it appears as if the problem
may be rare but legitimate.

3.) To determine, if the bug could be found, whether things
like mode pages would make sense in an open-source stack
to address issues such as this.

I've had a fair amount of private feedback that there is a good chance
the issue may be legitimate.  I've also had feedback that there may be
other issues with VMware 'corner-case' behavior.  Given the nature of
the VAAI extensions/primitives rolling out I would anticipate that to
be a fertile area for these types of issues as well.

I'm not even sure, given the nature of the issue, if it could be
tracked down but everyone who is interested in the issue can now be
looking for it.

Here is the essence of what we have to work with, redacted due to the
volume of messages, to sentinel events.

SDS proxy: 
Mar 19 21:50:30 PROXY kernel: rport-3:0-0: blocked FC remote port time out: 
removing target and saving binding
Mar 19 21:50:30 PROXY kernel: sd 3:0:0:0: rejecting I/O to offline device
Mar 19 21:50:32 PROXY kernel: qla2xxx [:04:00.1]-8009:3: DEVICE RESET 
ISSUED nexus=3:0:0 cmd=da751240.
..
.. Noise from Qlogic adapter doing DTB reset.
..
Mar 19 21:50:40 PROXY kernel: qla2xxx [:04:00.1]-8018:3: ADAPTER RESET 
ISSUED nexus=3:0:0.
..
.. More noise from the Qlogic adapater.
..
Mar 19 21:55:43 PROXY kernel: qla2xxx [:04:00.1]-8017:3: ADAPTER RESET 
SUCCEEDED nexus=3:0:0.
---

VMware logs: --
Mar 19 21:49:08 VMWARE1 2014-03-20T02:49:08.592Z VMWARE1 vmkernel: 
cpu2:8432)<6>qla2xxx :42:00.0: scsi(8:0:1): Abort command succeeded -- 1 
1247096017.
Mar 19 21:49:12 VMWARE1 2014-03-20T02:49:12.664Z VMWARE1 vobd:  
[vmfsCorrelator] 11527431227280us: [esx.problem.vmfs.heartbeat.timedout] 
52f3b635-9ea13918-af49-bc305bee68bc VOLUMENAME
Mar 19 21:49:14 VMWARE1 2014-03-20T02:49:12.670Z VMWARE1 Hostd: [24FDEB90 info 
'Vimsvc.ha-eventmgr'] Event 466 : Lost access to volume 
52f3b635-9ea13918-af49-bc305bee68bc (VOLUMENAME) due to connectivity issues. 
Recovery attempt is in progress and outcome will be reported shortly.
..
.. Noise from VMWARE hosts about wanting path updates - only single path
.. to proxy, heartbeat timeouts etc.
..
Mar 19 21:49:32 VMWARE1 2014-03-20T02:49:32.911Z VMWARE1 vmkernel: 
cpu10:8202)NMP: nmp_PathDetermineFailure:2084: SCSI cmd RESERVE failed on path 
vmhba2:C0:T0:L1, reservation state on device eui.6665356665393330 is unknown.
..
.. More noise from VMWARE hosts, additional reservation failures etc.
..
Mar 19 21:50:32 VMWARE2 2014-03-20T02:50:30.554Z VMWARE2 vmkwarning: 
cpu22:8237)WARNING: HBX: 564: Volume 52f3b635-9ea13918-af49-bc305bee68bc 
("VOLUMENAME") may be damaged on disk. Corrupt heartbeat detected at offset 
3653632: [HB state 0 offset 0 gen 0 stampUS 0 uuid --
---

And at that point things were pretty much over with.

I would certainly be open to suggestions on how to track or obtain
useful information for you.  The SDS proxy was sustaining about 350
megabytes/second of I/O from seven initiators so I don't think turning
on target mode debugging and cmd tracing is much of an option.

> Thanks,
> Vlad

Have a good weekend.

Greg

}-- End of excerpt from Vladislav Bolkhovitin

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.   Specializing in information infra-structure
Fargo, ND  58102development.
PH: 701-281-1686
FAX: 701-281-3949   EMAIL: g...@enjellic.com
--

Re: [RFC PATCH DONOTMERGE v2 0/6] userspace PI passthrough via AIO/DIO

2014-04-02 Thread Zach Brown
> 's a small number of them with strong semantics because they're a part
> of the syscall ABI.

("There's" a small number of them..  vim troubles :))

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] aio/dio: enable PI passthrough

2014-04-02 Thread Zach Brown
> +static int setup_pi_ext(struct kiocb *req, int is_write)
> +{
> + struct file *file = req->ki_filp;
> + struct io_extension *ext = &req->ki_ioext->ke_kern;
> + void *p;
> + unsigned long start, end;
> + int retval;
> +
> + if (!(file->f_flags & O_DIRECT)) {
> + pr_debug("EINVAL: can't use PI without O_DIRECT.\n");
> + return -EINVAL;
> + }
> +
> + BUG_ON(req->ki_ioext->ke_pi_iter.pi_userpages);
> +
> + end = (((unsigned long)ext->ie_pi_buf) + ext->ie_pi_buflen +
> + PAGE_SIZE - 1) >> PAGE_SHIFT;
> + start = ((unsigned long)ext->ie_pi_buf) >> PAGE_SHIFT;
> + req->ki_ioext->ke_pi_iter.pi_offset = offset_in_page(ext->ie_pi_buf);
> + req->ki_ioext->ke_pi_iter.pi_len = ext->ie_pi_buflen;
> + req->ki_ioext->ke_pi_iter.pi_nrpages = end - start;
> + p = kzalloc(req->ki_ioext->ke_pi_iter.pi_nrpages *
> + sizeof(struct page *),
> + GFP_NOIO);

Can userspace give us bad data and get us to generate insane allcation
attempt warnings?

> + if (p == NULL) {
> + pr_err("%s: no room for page array?\n", __func__);
> + return -ENOMEM;
> + }
> + req->ki_ioext->ke_pi_iter.pi_userpages = p;
> +
> + retval = get_user_pages_fast((unsigned long)ext->ie_pi_buf,
> +  req->ki_ioext->ke_pi_iter.pi_nrpages,
> +  is_write,

Isn't this is_write backwards?  If it's a write syscall then the PI
pages is going to be read from.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Zach Brown
> @@ -916,6 +921,17 @@ void aio_complete(struct kiocb *iocb, long res, long 
> res2)
>   struct io_event *ev_page, *event;
>   unsigned long   flags;
>   unsigned tail, pos;
> + int ret;
> +
> + ret = io_teardown_extensions(iocb);
> + if (ret) {
> + if (!res)
> + res = ret;
> + else if (!res2)
> + res2 = ret;
> + else
> + pr_err("error %d tearing down aio extensions\n", ret);
> + }

This ends up trying to copy the kernel's io_extension copy back to
userspace from interrupts, which obviously won't fly.

And to what end?  So that maybe someone can later add an 'extension'
that can fill in some field that's then copied to userspace?  But by
copying the entire argument struct back?

Let's not get ahead of ourselves.  If they're going to try and give
userspace some feedback after IO completion they're going to have to try
a lot harder because they don't have acces to the submitting task
context anymore.  They'd have to pin some reference to a feedback
mechanism in the in-flight io.  I think we'd want that explicit in the
iocb, not hiding off on the other side of this extension interface.

I'd just remove this generic teardown callback path entirely.  If
there's PI state hanging off the iocb tear it down during iocb teardown.

> +struct io_extension_type {
> + unsigned int type;
> + unsigned int extension_struct_size;
> + int (*setup_fn)(struct kiocb *, int is_write);
> + int (*destroy_fn)(struct kiocb *);
> +};

I'd also get rid of all of this.  More below.

> +static int io_setup_extensions(struct kiocb *req, int is_write,
> +struct io_extension __user *ioext)
> +{
> + struct io_extension_type *iet;
> + __u64 sz, has;
> + int ret;
> +
> + /* Check size of buffer */
> + if (unlikely(copy_from_user(&sz, &ioext->ie_size, sizeof(sz
> + return -EFAULT;
> + if (sz > PAGE_SIZE ||
> + sz > sizeof(struct io_extension) ||
> + sz < IO_EXT_SIZE(ie_has))
> + return -EINVAL;
> +
> + /* Check that the buffer's big enough */
> + if (unlikely(copy_from_user(&has, &ioext->ie_has, sizeof(has
> + return -EFAULT;
> + ret = io_check_bufsize(has, sz);
> + if (ret)
> + return ret;
> +
> + /* Copy from userland */
> + req->ki_ioext = kzalloc(sizeof(struct kio_extension), GFP_NOIO);
> + if (!req->ki_ioext)
> + return -ENOMEM;
> +
> + req->ki_ioext->ke_user = ioext;
> + if (unlikely(copy_from_user(&req->ki_ioext->ke_kern, ioext, sz))) {
> + ret = -EFAULT;
> + goto out;
> + }

(Isn't there some allocate-and-copy-from-userspace helper now? But..)

I don't like the rudundancy of the implicit size requirement by a
field's flag being set being duplicated by the explicit size argument.
What does that give us, exactly?

Our notion of the total size only seems to only matter if we're copying
the entire struct from userspace and I'm don't think we need to do that.

For each argument, we're translating it into some kernel equivalent,
right?  Fields in the iocb  As each of these are initialized I'd just
test the presence bits and __get_user() the userspace arguemnts
directly, or copy_from_user() something slightly more complicated on to
the stack.

That gets rid of us having to care about the size at all.  It stops us
from allocating a kernel copy and pinning it for the duration of the IO.
We'd just be sampling the present userspace arguments as we initialie
the iocb during submission.

> + /* Try to initialize all the extensions */
> + has = 0;
> + for (iet = extensions; iet->type != IO_EXT_INVALID; iet++) {
> + if (!(req->ki_ioext->ke_kern.ie_has & iet->type))
> + continue;
> + ret = iet->setup_fn(req, is_write);
> + if (ret) {
> + req->ki_ioext->ke_kern.ie_has = has;
> + goto out_destroy;
> + }
> + has |= iet->type;
> + }

So instead of doing all this we'd test explicit bits and act
accordingly.  If they're trivial translations between userspace fields
and iocb fields we could just do it inline in this helper that'd be more
like iocb_parse_more_args(iocb, struct __user *ptr).  For more
complicated stuff, like the PI page pinning, it could call out to PI.

> + user_ext = (struct io_extension __user *)iocb->aio_extension_ptr;

Need a __force there?  Has this been run through sparse?

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] io: define an interface for IO extensions

2014-04-02 Thread Jeff Moyer
"Darrick J. Wong"  writes:

> Define a generic interface to allow userspace to attach metadata to an
> IO operation.  This interface will be used initially to implement
> protection information (PI) pass through, though it ought to be usable
> by anyone else desiring to extend the IO interface.  It should not be
> difficult to modify the non-AIO calls to use this mechanism.

My main issue with this patch is determining what exactly gets returned
to userspace when there is an issue in the teardown_extensions path.
It looks like you'll get the first error propagated from
io_teardown_extensions, others are ignored.  Then, in aio_complete, if
there was no error with the I/O, then you'll get the teardown error
reported in event->res, otherwise you'll get it in event->res2.  So,
what are the valid errors returned by the teardown routine for
extensions?  How is the userspace app supposed to determine where the
error came from, the I/O or a failure in the extension teardown?

I think it may make sense to only use res2 for reporting io extension
teardown failures.  Any new code that will use extensions can certainly
be written to check both res and res2, and this method would prevent the
ambiguity I mentioned.

Finally, I know this is an RFC, but please add some man-page changes to
your patch set, and CC linux-man.  Michael Kerrisk typically has
valuable advice on new APIs.

Cheers,
Jeff

>
> Signed-off-by: Darrick J. Wong 
> ---
>  fs/aio.c |  180 
> +-
>  include/linux/aio.h  |7 ++
>  include/uapi/linux/aio_abi.h |   15 +++-
>  3 files changed, 197 insertions(+), 5 deletions(-)
>
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 062a5f6..0c40bdc 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -158,6 +158,11 @@ static struct vfsmount *aio_mnt;
>  static const struct file_operations aio_ring_fops;
>  static const struct address_space_operations aio_ctx_aops;
>  
> +static int io_teardown_extensions(struct kiocb *req);
> +static int io_setup_extensions(struct kiocb *req, int is_write,
> +struct io_extension __user *ioext);
> +static int iocb_setup_extensions(struct iocb *iocb, struct kiocb *req);
> +
>  static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages)
>  {
>   struct qstr this = QSTR_INIT("[aio]", 5);
> @@ -916,6 +921,17 @@ void aio_complete(struct kiocb *iocb, long res, long 
> res2)
>   struct io_event *ev_page, *event;
>   unsigned long   flags;
>   unsigned tail, pos;
> + int ret;
> +
> + ret = io_teardown_extensions(iocb);
> + if (ret) {
> + if (!res)
> + res = ret;
> + else if (!res2)
> + res2 = ret;
> + else
> + pr_err("error %d tearing down aio extensions\n", ret);
> + }
>  
>   /*
>* Special case handling for sync iocbs:
> @@ -1350,15 +1366,167 @@ rw_common:
>   return 0;
>  }
>  
> +/* IO extension code */
> +#define REQUIRED_STRUCTURE_SIZE(type, member)\
> + (offsetof(type, member) + sizeof(((type *)NULL)->member))
> +#define IO_EXT_SIZE(member) \
> + REQUIRED_STRUCTURE_SIZE(struct io_extension, member)
> +
> +struct io_extension_type {
> + unsigned int type;
> + unsigned int extension_struct_size;
> + int (*setup_fn)(struct kiocb *, int is_write);
> + int (*destroy_fn)(struct kiocb *);
> +};
> +
> +static struct io_extension_type extensions[] = {
> + {IO_EXT_INVALID, 0, NULL, NULL},
> +};
> +
> +static int is_write_iocb(struct iocb *iocb)
> +{
> + switch (iocb->aio_lio_opcode) {
> + case IOCB_CMD_PWRITE:
> + case IOCB_CMD_PWRITEV:
> + return 1;
> + default:
> + return 0;
> + }
> +}
> +
> +static int io_teardown_extensions(struct kiocb *req)
> +{
> + struct io_extension_type *iet;
> + int ret, ret2;
> +
> + if (req->ki_ioext == NULL)
> + return 0;
> +
> + /* Shut down all the extensions */
> + ret = 0;
> + for (iet = extensions; iet->type != IO_EXT_INVALID; iet++) {
> + if (!(req->ki_ioext->ke_kern.ie_has & iet->type))
> + continue;
> + ret2 = iet->destroy_fn(req);
> + if (ret2 && !ret)
> + ret = ret2;
> + }
> +
> + /* Copy out return values */
> + if (unlikely(copy_to_user(req->ki_ioext->ke_user,
> +   &req->ki_ioext->ke_kern,
> +   sizeof(struct io_extension {
> + if (!ret)
> + ret = -EFAULT;
> + }
> +
> + kfree(req->ki_ioext);
> + req->ki_ioext = NULL;
> + return ret;
> +}
> +
> +static int io_check_bufsize(__u64 has, __u64 size)
> +{
> + struct io_extension_type *iet;
> + __u64 all_flags = 0;
> +
> + for (iet = extensions; iet->type != IO_EXT_INVALID; iet++) {
> + all_flags |= iet->type;
> + if (!(has &

Re: [PATCH 1/6] fs/bio-integrity: remove duplicate code

2014-04-02 Thread Zach Brown
> +static int bio_integrity_generate_verify(struct bio *bio, int operate)
>  {

> + if (operate)
> + sector = bio->bi_iter.bi_sector;
> + else
> + sector = bio->bi_integrity->bip_iter.bi_sector;

> + if (operate) {
> + bi->generate_fn(&bix);
> + } else {
> + ret = bi->verify_fn(&bix);
> + if (ret) {
> + kunmap_atomic(kaddr);
> + return ret;
> + }
> + }

I was glad to see this replaced with explicit sector and func arguments
in later refactoring in the 6/ patch.

But I don't think the function poiner casts in that 6/ patch are wise
(Or even safe all the time, given crazy function pointer trampolines?
Is that still a thing?).  I'd have made a single walk_fn type that
returns and have the non-returning iterators just return 0.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH DONOTMERGE v2 0/6] userspace PI passthrough via AIO/DIO

2014-04-02 Thread Zach Brown
On Mon, Mar 24, 2014 at 09:22:31AM -0700, Darrick J. Wong wrote:
> This RFC provides a rough implementation of a mechanism to allow
> userspace to attach protection information (e.g. T10 DIF) data to a
> disk write and to receive the information alongside a disk read.

I have some comments for you! :)  Mostly about the interface up in aio.
I don't have all that much to say about the bio/pi bits.

> Patch #2 implements a generic IO extension interface so that we can
> receive a struct io_extension from userspace containing the structure
> size, a flag telling us which extensions we'd like to use (ie_has),
> and (eventually) extension data.  There's a small framework for
> mapping ie_has bits to actual extensions.

I still really don't think that we should be thinking of these as
generic extensions.  We're talking about arguments to syscalls.  's a
small number of them with strong semantics because they're a part of the
syscall ABI.  I don't think we should implement them by iterating over
per-field ops structs.

Anyway, more in reply to the patches.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-04-02 Thread Nicholas A. Bellinger
On Wed, 2014-04-02 at 14:43 +0300, sagi grimberg wrote:
> On 4/1/2014 8:45 PM, Nicholas A. Bellinger wrote:
> > On Tue, 2014-04-01 at 20:27 +0300, sagi grimberg wrote:
> >> On 4/1/2014 8:09 PM, Martin K. Petersen wrote:
>  "Sagi" == Sagi Grimberg  writes:
> >>> Sagi> I originally wrote the code to support that. But it got left
> >>> Sagi> behind since I figured it is not an interesting use-case.  If your
> >>> Sagi> beckend doesn't support T10-PI why should the target publish it
> >>> Sagi> support it and ask the device to strip/insert it?  I suppose it is
> >>> Sagi> to allow the initiator to protect half-way, but I don't know how
> >>> Sagi> interesting it is if the data is not stored with protection...
> >>>
> >>> That depends what you do on the backend. There are several devices out
> >>> there that expose PI to the host but use a different protection scheme
> >>> internally. And then synthesize PI on the host-facing side. Some even do
> >>> T10 PI to an internal protection scheme and then back to T10 PI when
> >>> talking to the disk drives in the back end.
> >>>
> >> Hey Martin,
> >>
> >> I understand, but even for internal different T10-PI schemes, is
> >> stripping protection from incoming data
> >> at the fabric level (and then do whatever with it in the backend level)
> >> the right thing to do here?
> >> I mean we basically lose protection across the PCI with this scheme
> >> aren't we?
> >>
> > The WRITE_STRIP + READ_INSERT case would be still be useful for IBLOCK
> > backends that don't support real hw PI, so that at least the protection
> > can be in place for data movement between physical machines.
> >
> > Also, I think the amount of changes required to support this type of
> > configuration in target-core are quite small.
> 
> Not sure I understood your intention here.
> Do you mean that the backstore doesn't support T10-PI, the transport 
> support T10-PI and target publish to fabric that it support T10-PI?
> This will lead to the DIN_INSERT/DOUT_STRIP you are referring to right?

Correct.

> Is there any value to publish the fabric PI support if your backing 
> device doesn't support it?

So yes, I think there is value is allowing a transport to publish PI
support for a device, even when the backend does not support it.

In terms of the possibilities for data corruption, moving payloads
between physical hosts would certainly have a higher potential given the
complexity and number of overall components involved in the transaction.

If all unprotected devices should report PI (by default) when the
transport supports it is a separate question..  ;)

> 
> The opposite case (backstore support, transport no support) it makes 
> sense to NOT publish support and do half-way protection in SW
> (the backstore is formatted with PI).
> 

Correct.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-04-02 Thread Nicholas A. Bellinger
On Wed, 2014-04-02 at 09:51 +0300, Sagi Grimberg wrote:
> On 4/1/2014 8:45 PM, Nicholas A. Bellinger wrote:
> > On Tue, 2014-04-01 at 20:27 +0300, sagi grimberg wrote:
> >> On 4/1/2014 8:09 PM, Martin K. Petersen wrote:
>  "Sagi" == Sagi Grimberg  writes:
> >>> Sagi> I originally wrote the code to support that. But it got left
> >>> Sagi> behind since I figured it is not an interesting use-case.  If your
> >>> Sagi> beckend doesn't support T10-PI why should the target publish it
> >>> Sagi> support it and ask the device to strip/insert it?  I suppose it is
> >>> Sagi> to allow the initiator to protect half-way, but I don't know how
> >>> Sagi> interesting it is if the data is not stored with protection...
> >>>
> >>> That depends what you do on the backend. There are several devices out
> >>> there that expose PI to the host but use a different protection scheme
> >>> internally. And then synthesize PI on the host-facing side. Some even do
> >>> T10 PI to an internal protection scheme and then back to T10 PI when
> >>> talking to the disk drives in the back end.
> >>>
> >> Hey Martin,
> >>
> >> I understand, but even for internal different T10-PI schemes, is
> >> stripping protection from incoming data
> >> at the fabric level (and then do whatever with it in the backend level)
> >> the right thing to do here?
> >> I mean we basically lose protection across the PCI with this scheme
> >> aren't we?
> >>
> > The WRITE_STRIP + READ_INSERT case would be still be useful for IBLOCK
> > backends that don't support real hw PI, so that at least the protection
> > can be in place for data movement between physical machines.
> >
> > Also, I think the amount of changes required to support this type of
> > configuration in target-core are quite small.
> 
> So trying to understand how this will come to use.
> Target will publish the fabric T10-PI support based only on the 
> transport configuration (not accounting the backing devices configuration).

Yes, passing in the transport configuration for PI at
transport_init_session() time seems to make the most sense here in order
to address all fabric types.

> Then upon each cmd the target will look on {backstore configuration, 
> PROTECT bit, transport configuration} - then will decide on protection
> operation (STRIP/INSERT/PASS).
> 
> Looks right?
> 

Correct.

I'm thinking it makes sense for target-core to perform the WRITE_INSERT
+ READ_STRIP (in software) when the transport does not directly support
PI, but the backend has PI enabled.

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/23] scsi: Use pci_enable_msix_range() instead of pci_enable_msix()

2014-04-02 Thread Bjorn Helgaas
On Wed, Apr 2, 2014 at 5:49 AM, Alexander Gordeev  wrote:
> On Wed, Mar 12, 2014 at 10:04:40PM -0600, Bjorn Helgaas wrote:
>> Hi James,
>>
>> I think Alexander sent these to linux-scsi hoping that you would handle
>> them, but I know it's a hassle because they depend on f7fc32c, which went
>> in after the merge window.
>>
>> I'd be glad to review these and apply them through my tree, unless you want
>> to do it.  I'd like to get these merged in the v3.15 merge window so
>> Alexander can move on to something else.  I haven't checked for merge
>> conflicts with scsi.git yet, but I assume they'd be pretty trivial if there
>> are any.
>
> Hi Bjorn,
>
> It has shifted to v3.16, right?

Yep, sorry, I meant to get to these, but was too busy, so they'll
probably go in v3.16.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] blk-mq: do not initialize req->special

2014-04-02 Thread Christoph Hellwig
Drivers can reach their private data easily using the blk_mq_rq_to_pdu
helper and don't need req->special.  By not initializing it code can
be simplified nicely, and we also shave off a few more instructions from
the I/O path.

Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c  |   10 ++
 block/blk-mq.c |   15 ++-
 block/blk-mq.h |1 -
 drivers/block/null_blk.c   |4 ++--
 drivers/block/virtio_blk.c |6 +++---
 5 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 43e6b47..9a0c427 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -306,22 +306,16 @@ static bool blk_kick_flush(struct request_queue *q)
 */
q->flush_pending_idx ^= 1;
 
+   blk_rq_init(q, q->flush_rq);
if (q->mq_ops) {
-   struct blk_mq_ctx *ctx = first_rq->mq_ctx;
-   struct blk_mq_hw_ctx *hctx = q->mq_ops->map_queue(q, ctx->cpu);
-
-   blk_mq_rq_init(hctx, q->flush_rq);
-   q->flush_rq->mq_ctx = ctx;
-
/*
 * Reuse the tag value from the fist waiting request,
 * with blk-mq the tag is generated during request
 * allocation and drivers can rely on it being inside
 * the range they asked for.
 */
+   q->flush_rq->mq_ctx = first_rq->mq_ctx;
q->flush_rq->tag = first_rq->tag;
-   } else {
-   blk_rq_init(q, q->flush_rq);
}
 
q->flush_rq->cmd_type = REQ_TYPE_FS;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4274ee0..871acd6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -248,24 +248,13 @@ struct request *blk_mq_alloc_reserved_request(struct 
request_queue *q, int rw,
 }
 EXPORT_SYMBOL(blk_mq_alloc_reserved_request);
 
-/*
- * Re-init and set pdu, if we have it
- */
-void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq)
-{
-   blk_rq_init(hctx->queue, rq);
-
-   if (hctx->cmd_size)
-   rq->special = blk_mq_rq_to_pdu(rq);
-}
-
 static void __blk_mq_free_request(struct blk_mq_hw_ctx *hctx,
  struct blk_mq_ctx *ctx, struct request *rq)
 {
const int tag = rq->tag;
struct request_queue *q = rq->q;
 
-   blk_mq_rq_init(hctx, rq);
+   blk_rq_init(hctx->queue, rq);
blk_mq_put_tag(hctx->tags, tag);
 
blk_mq_queue_exit(q);
@@ -1139,7 +1128,7 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
left -= to_do * rq_size;
for (j = 0; j < to_do; j++) {
hctx->rqs[i] = p;
-   blk_mq_rq_init(hctx, hctx->rqs[i]);
+   blk_rq_init(hctx->queue, hctx->rqs[i]);
p += rq_size;
i++;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ebbe6ba..238379a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool 
async);
 void blk_mq_init_flush(struct request_queue *q);
 void blk_mq_drain_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
-void blk_mq_rq_init(struct blk_mq_hw_ctx *hctx, struct request *rq);
 
 /*
  * CPU hotplug helpers
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 091b9ea..71df69d 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -226,7 +226,7 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd)
 
 static void null_softirq_done_fn(struct request *rq)
 {
-   end_cmd(rq->special);
+   end_cmd(blk_mq_rq_to_pdu(rq));
 }
 
 static inline void null_handle_cmd(struct nullb_cmd *cmd)
@@ -311,7 +311,7 @@ static void null_request_fn(struct request_queue *q)
 
 static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq)
 {
-   struct nullb_cmd *cmd = rq->special;
+   struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
cmd->rq = rq;
cmd->nq = hctx->driver_data;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0eace43..11e8f4b 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -112,7 +112,7 @@ static int __virtblk_add_req(struct virtqueue *vq,
 
 static inline void virtblk_request_done(struct request *req)
 {
-   struct virtblk_req *vbr = req->special;
+   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
int error = virtblk_result(vbr);
 
if (req->cmd_type == REQ_TYPE_BLOCK_PC) {
@@ -154,7 +154,7 @@ static void virtblk_done(struct virtqueue *vq)
 static int virtio_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *req)
 {
struct virtio_blk *vblk = hctx->queue->queuedata;
-   struct virtblk_req *vbr = req->special;
+   struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
unsigned long flags;
unsigned int num;
const bool last = (req->cmd_flags & R

[PATCH 1/5] blk-mq: initialize resid_len

2014-04-02 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b1bcc61..44f738e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -350,6 +350,8 @@ static void blk_mq_start_request(struct request *rq, bool 
last)
 
trace_block_rq_issue(q, rq);
 
+   rq->resid_len = blk_rq_bytes(rq);
+
/*
 * Just mark start time and set the started bit. Due to memory
 * ordering, we know we'll see the correct deadline as long as
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] blk-mq: make ->flush_rq fully transparent to drivers

2014-04-02 Thread Christoph Hellwig
Drivers shouldn't have to care about the block layer setting aside a
request to implement the flush state machine.  We already override the
mq context and tag to make it more transparent, but so far haven't deal
with the driver private data in the request.  Make sure to override this
as well, and while we're at it add a proper helper sitting in blk-mq.c
that implements the full impersonation.

Signed-off-by: Christoph Hellwig 
---
 block/blk-flush.c |   12 ++--
 block/blk-mq.c|   20 
 block/blk-mq.h|2 ++
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9a0c427..b218f61 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -307,16 +307,8 @@ static bool blk_kick_flush(struct request_queue *q)
q->flush_pending_idx ^= 1;
 
blk_rq_init(q, q->flush_rq);
-   if (q->mq_ops) {
-   /*
-* Reuse the tag value from the fist waiting request,
-* with blk-mq the tag is generated during request
-* allocation and drivers can rely on it being inside
-* the range they asked for.
-*/
-   q->flush_rq->mq_ctx = first_rq->mq_ctx;
-   q->flush_rq->tag = first_rq->tag;
-   }
+   if (q->mq_ops)
+   blk_mq_clone_flush_request(q->flush_rq, first_rq);
 
q->flush_rq->cmd_type = REQ_TYPE_FS;
q->flush_rq->cmd_flags = WRITE_FLUSH | REQ_FLUSH_SEQ;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 871acd6..289b7e5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -272,6 +272,26 @@ void blk_mq_free_request(struct request *rq)
__blk_mq_free_request(hctx, ctx, rq);
 }
 
+/*
+ * Clone all relevant state from a request that has been put on hold in
+ * the flush state machine into the preallocated flush request that hangs
+ * off the request queue.
+ *
+ * For a driver the flush request should be invisible, that's why we are
+ * impersonating the original request here.
+ */
+void blk_mq_clone_flush_request(struct request *flush_rq,
+   struct request *orig_rq)
+{
+   struct blk_mq_hw_ctx *hctx =
+   orig_rq->q->mq_ops->map_queue(orig_rq->q, orig_rq->mq_ctx->cpu);
+
+   flush_rq->mq_ctx = orig_rq->mq_ctx;
+   flush_rq->tag = orig_rq->tag;
+   memcpy(blk_mq_rq_to_pdu(flush_rq), blk_mq_rq_to_pdu(orig_rq),
+   hctx->cmd_size);
+}
+
 bool blk_mq_end_io_partial(struct request *rq, int error, unsigned int 
nr_bytes)
 {
if (blk_update_request(rq, error, nr_bytes))
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 238379a..7964dad 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,6 +27,8 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool 
async);
 void blk_mq_init_flush(struct request_queue *q);
 void blk_mq_drain_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
+void blk_mq_clone_flush_request(struct request *flush_rq,
+   struct request *orig_rq);
 
 /*
  * CPU hotplug helpers
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] blk-mq: fix blk_mq_end_io_partial

2014-04-02 Thread Christoph Hellwig
We need to pass the actual nr_bytes instead of all bytes in the request.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44f738e..4274ee0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -285,7 +285,7 @@ void blk_mq_free_request(struct request *rq)
 
 bool blk_mq_end_io_partial(struct request *rq, int error, unsigned int 
nr_bytes)
 {
-   if (blk_update_request(rq, error, blk_rq_bytes(rq)))
+   if (blk_update_request(rq, error, nr_bytes))
return true;
 
blk_account_io_done(rq);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] blk-mq: add ->init_request and ->exit_request methods

2014-04-02 Thread Christoph Hellwig
The current blk_mq_init_commands/blk_mq_free_commands interface has a two
problems:

 1) Because only the constructor is passed to blk_mq_init_commands there
is no easy way to clean up when a comman initialization failed.  The
current code simply leaks the allocations done in the constructor.
 2) There is no good place to call blk_mq_free_commands: before
blk_cleanup_queue there is no guarantee that all outstanding commands
have completed, so we can't free them yet.  After blk_cleanup_queue
the queue has usually been freed.  This can be worked around by
grabbing an unconditional reference before calling blk_cleanup_queue
and dropping it after blk_mq_free_commands is done, although that's
not exatly pretty and driver writers are guaranteed to get it wrong
sooner or later.

Both issues are easily fixed by making the request constructor and
destructor normal blk_mq_ops methods.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c |  105 ++--
 drivers/block/virtio_blk.c |   23 +-
 include/linux/blk-mq.h |   14 +-
 3 files changed, 55 insertions(+), 87 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 289b7e5..e0b65c4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1005,74 +1005,20 @@ static void blk_mq_hctx_notify(void *data, unsigned 
long action,
blk_mq_put_ctx(ctx);
 }
 
-static int blk_mq_init_hw_commands(struct blk_mq_hw_ctx *hctx,
-  int (*init)(void *, struct blk_mq_hw_ctx *,
-   struct request *, unsigned int),
-  void *data)
+static void blk_mq_free_rq_map(struct blk_mq_hw_ctx *hctx, void *driver_data)
 {
-   unsigned int i;
-   int ret = 0;
-
-   for (i = 0; i < hctx->queue_depth; i++) {
-   struct request *rq = hctx->rqs[i];
-
-   ret = init(data, hctx, rq, i);
-   if (ret)
-   break;
-   }
-
-   return ret;
-}
-
-int blk_mq_init_commands(struct request_queue *q,
-int (*init)(void *, struct blk_mq_hw_ctx *,
-   struct request *, unsigned int),
-void *data)
-{
-   struct blk_mq_hw_ctx *hctx;
-   unsigned int i;
-   int ret = 0;
-
-   queue_for_each_hw_ctx(q, hctx, i) {
-   ret = blk_mq_init_hw_commands(hctx, init, data);
-   if (ret)
-   break;
-   }
-
-   return ret;
-}
-EXPORT_SYMBOL(blk_mq_init_commands);
-
-static void blk_mq_free_hw_commands(struct blk_mq_hw_ctx *hctx,
-   void (*free)(void *, struct blk_mq_hw_ctx *,
-   struct request *, unsigned int),
-   void *data)
-{
-   unsigned int i;
+   struct page *page;
 
-   for (i = 0; i < hctx->queue_depth; i++) {
-   struct request *rq = hctx->rqs[i];
+   if (hctx->rqs && hctx->queue->mq_ops->exit_request) {
+   int i;
 
-   free(data, hctx, rq, i);
+   for (i = 0; i < hctx->queue_depth; i++) {
+   if (!hctx->rqs[i])
+   continue;
+   hctx->queue->mq_ops->exit_request(driver_data, hctx,
+ hctx->rqs[i], i);
+   }
}
-}
-
-void blk_mq_free_commands(struct request_queue *q,
- void (*free)(void *, struct blk_mq_hw_ctx *,
-   struct request *, unsigned int),
- void *data)
-{
-   struct blk_mq_hw_ctx *hctx;
-   unsigned int i;
-
-   queue_for_each_hw_ctx(q, hctx, i)
-   blk_mq_free_hw_commands(hctx, free, data);
-}
-EXPORT_SYMBOL(blk_mq_free_commands);
-
-static void blk_mq_free_rq_map(struct blk_mq_hw_ctx *hctx)
-{
-   struct page *page;
 
while (!list_empty(&hctx->page_list)) {
page = list_first_entry(&hctx->page_list, struct page, lru);
@@ -1097,10 +1043,12 @@ static size_t order_to_size(unsigned int order)
 }
 
 static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
- unsigned int reserved_tags, int node)
+   struct blk_mq_reg *reg, void *driver_data, int node)
 {
+   unsigned int reserved_tags = reg->reserved_tags;
unsigned int i, j, entries_per_page, max_order = 4;
size_t rq_size, left;
+   int error;
 
INIT_LIST_HEAD(&hctx->page_list);
 
@@ -1149,14 +1097,23 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx 
*hctx,
for (j = 0; j < to_do; j++) {
hctx->rqs[i] = p;
blk_rq_init(hctx->queue, hctx->rqs[i]);
+   if (reg->ops->init_request) {
+   error = reg->ops->init_request(dr

blk-mq updates, V3

2014-04-02 Thread Christoph Hellwig
The first two patches are unchanged from the last two iterations of
the series.

Patches 3 and 4 are new: 3 is a preparation for 4 and other future work,
and patch 4 makes sure that drivers don't have to implement special
workarounds for the magic flush request.  Patch 5 is an updated version
of the earlier init/exit_request versions which should make everyone
happy now as there are no special NULL hardware contexts or negative
request numbers due to patch 4 sorting out the flush_rq mess.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] mpt3sas: Remove use of DEF_SCSI_QCMD

2014-04-02 Thread Matthew Wilcox
On Wed, Apr 02, 2014 at 03:41:15AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 27, 2014 at 04:40:31PM -0400, Matthew Wilcox wrote:
> > Removing the host_lock from the I/O submission path gives a huge
> > scalability improvement.
> 
> This looks reasonable to me, but did you do a full audit that all state
> is properly synchronized in the functions called from _scsih_qcmd_lck?
> 
> If so that should be stated in the patch description.

I did one for mpt2sas back in July 2011 when I first submitted this patch.
http://marc.info/?l=linux-scsi&m=130980763617589&w=2

I must confess to having not done so for this iteration.  The equivalent
to this patch has been in RHEL since 6.2, so I hope that gives us some
amount of confidence.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/23] scsi: Use pci_enable_msix_range() instead of pci_enable_msix()

2014-04-02 Thread Alexander Gordeev
On Wed, Mar 12, 2014 at 10:04:40PM -0600, Bjorn Helgaas wrote:
> Hi James,
> 
> I think Alexander sent these to linux-scsi hoping that you would handle
> them, but I know it's a hassle because they depend on f7fc32c, which went
> in after the merge window.
> 
> I'd be glad to review these and apply them through my tree, unless you want
> to do it.  I'd like to get these merged in the v3.15 merge window so
> Alexander can move on to something else.  I haven't checked for merge
> conflicts with scsi.git yet, but I assume they'd be pretty trivial if there
> are any.

Hi Bjorn,

It has shifted to v3.16, right?

> Bjorn

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] target/core: T10-Dif: check HW support capabilities

2014-04-02 Thread sagi grimberg

On 4/1/2014 8:45 PM, Nicholas A. Bellinger wrote:

On Tue, 2014-04-01 at 20:27 +0300, sagi grimberg wrote:

On 4/1/2014 8:09 PM, Martin K. Petersen wrote:

"Sagi" == Sagi Grimberg  writes:

Sagi> I originally wrote the code to support that. But it got left
Sagi> behind since I figured it is not an interesting use-case.  If your
Sagi> beckend doesn't support T10-PI why should the target publish it
Sagi> support it and ask the device to strip/insert it?  I suppose it is
Sagi> to allow the initiator to protect half-way, but I don't know how
Sagi> interesting it is if the data is not stored with protection...

That depends what you do on the backend. There are several devices out
there that expose PI to the host but use a different protection scheme
internally. And then synthesize PI on the host-facing side. Some even do
T10 PI to an internal protection scheme and then back to T10 PI when
talking to the disk drives in the back end.


Hey Martin,

I understand, but even for internal different T10-PI schemes, is
stripping protection from incoming data
at the fabric level (and then do whatever with it in the backend level)
the right thing to do here?
I mean we basically lose protection across the PCI with this scheme
aren't we?


The WRITE_STRIP + READ_INSERT case would be still be useful for IBLOCK
backends that don't support real hw PI, so that at least the protection
can be in place for data movement between physical machines.

Also, I think the amount of changes required to support this type of
configuration in target-core are quite small.


Not sure I understood your intention here.
Do you mean that the backstore doesn't support T10-PI, the transport 
support T10-PI and target publish to fabric that it support T10-PI?

This will lead to the DIN_INSERT/DOUT_STRIP you are referring to right?
Is there any value to publish the fabric PI support if your backing 
device doesn't support it?


The opposite case (backstore support, transport no support) it makes 
sense to NOT publish support and do half-way protection in SW

(the backstore is formatted with PI).

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] fusion: Add free msg frames to the head, not tail of list

2014-04-02 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] mpt2sas: Add free smids to the head, not tail of list

2014-04-02 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] mpt2sas: Remove uses of serial_number

2014-04-02 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] mpt3sas: Remove use of DEF_SCSI_QCMD

2014-04-02 Thread Christoph Hellwig
On Thu, Mar 27, 2014 at 04:40:31PM -0400, Matthew Wilcox wrote:
> Removing the host_lock from the I/O submission path gives a huge
> scalability improvement.

This looks reasonable to me, but did you do a full audit that all state
is properly synchronized in the functions called from _scsih_qcmd_lck?

If so that should be stated in the patch description.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] mpt3sas: Remove uses of serial_number

2014-04-02 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] tcm_qla2xxx: T10-Dif set harware capability

2014-04-02 Thread sagi grimberg

On 4/1/2014 8:40 PM, Nicholas A. Bellinger wrote:



So no way to get it centralized?

Yes, still working on how that might look..


  I still don't understand the iscsi/iser constraint.

Every other fabric aside from iscsi/iser could simply provide a
TFO->get_fabric_prot(se_tpg) to query for supported PI.

The reason why iscsi/iser is unique is because we can have different
network portals (eg: iser protected + traditional iscsi unprotected) on
the same se_tpg endpoint.


I see. That seems solvable to me... perhaps the easiest thing is to just 
go ahead and support T10-PI

in iscsi-tcp (SW)...

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/7] Performance improvements for LSI SCSI cards

2014-04-02 Thread Desai, Kashyap
CC few more LSI driver developers.

> -Original Message-
> From: Nicholas A. Bellinger [mailto:n...@linux-iscsi.org]
> Sent: Monday, March 31, 2014 11:25 AM
> To: Matthew Wilcox
> Cc: linux-scsi@vger.kernel.org; wi...@linux.intel.com; Desai, Kashyap;
> Saxena, Sumit; James Bottomley
> Subject: Re: [PATCH 0/7] Performance improvements for LSI SCSI cards
> 
> Hi Matthew,
> 
> On Thu, 2014-03-27 at 16:40 -0400, Matthew Wilcox wrote:
> > The host lock is a serious scalability problem on 2-socket and larger
> > systems which are doing a lot of I/O.  Before removing the temporary
> > usgae of DEF_SCSI_QCMD, we need to remove all uses of serial_number.
> >
> > An unrelated performance issue is that reusing the most recent
> > driver-specific data structure to track the I/O instead of the least
> > recently used keeps the cache-hot lines in use, which is a nice
> > performance improvement.  It's already present in the mpt3sas driver,
> > it just didn't make it into the fusion or mpt2sas drivers yet.
> >
> > Matthew Wilcox (7):
> >   mpt3sas: Remove uses of serial_number
> >   mpt3sas: Remove use of DEF_SCSI_QCMD
> >   mpt2sas: Remove uses of serial_number
> >   mpt2sas: Remove use of DEF_SCSI_QCMD
> >   mpt2sas: Add free smids to the head, not tail of list
> >   fusion: Add free msg frames to the head, not tail of list
> >   fusion: Remove use of DEF_SCSI_QCMD
> >
> >
> 
> +1 to this long overdue series to enable host_lock-less mode with
> mpt*sas + fusion.  (CC'ing LSI folks + jejb)
> 
> Reviewed-by: Nicholas Bellinger 
> 

N�r��yb�X��ǧv�^�)޺{.n�+{���"�{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj"��!�i

Re: [RFC] blk-mq: support for shared tags

2014-04-02 Thread Christoph Hellwig
On Tue, Apr 01, 2014 at 05:16:21PM -0700, Matias Bjorling wrote:
> Hi Christoph,
> 
> Can you rebase it on top of 3.14. I have trouble applying it for testing.

Hi Martin,

the series is based on top of Jens' for-next branch.  I've also pushed out a
git tree to the blk-mq-share-tags.2 branch of

git://git.infradead.org/users/hch/scsi.git

to make testing and reviewing easier.

> For nvme, there's need for two separate types of queues. The admin queue
> (before initializing blk-mq) and the actual hardware queues.
> 
> Should we allow the driver to get/put tags before initializing blk-mq?
> Or let drivers implement their own framework?

What do you mean with initializing blk-mq?  We need to allocate data
structures for sure, and I don't see much else in terms of initialization
in blk-mq.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html