[openib-general] A question about sa_query
Hi, There is something that bothers me in sa_query. According to table 115 in the IB-SPEC when the status in the MAD hdr is 1,2 or 3 it shouldn't be considered to as an error. (1 means busy, 2 means redirection, and 3 means both). The function "recv_handler" in core/sa_query.c sets the status of the sa_query before calling the callback function. It sets the status according to the status returned in the mad header. (mad_recv_wc->recv_buf.mad->mad_hdr.status) If the status in the mad_hdr is different from 0 it sets the return status to -EINVAL. This mean that the higher layers (e.g., SRP) do not know what was the exact status and therefore treat status 1 (busy) as an error. Am I missing something? Ishai ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/SRP Userspace: srptools/srp_daemon - Fix connect bug and add support for user specified initiator extension
Thanks for your patch. I agree with some of the changes you suggest and disagree with others. It will be useful to post a different patch for each logical change. > 1. Fixes bug in srp_daemon for the case where if it is invoked with the '-e' option, it fails to connect to the SRP targets because of a newline character in the parameter string. You are right that the '\n' is redundant, but I have not seen the bug it creates. The last parameter in the string is considered to be a string by ib_srp and therefore ib_srp will ignore the newline. > 2. Changes the name of the constant 'MAX_TRAGET_CONFIG_STR_STRING' to 'MAX_TARGET_CONFIG_STR_STRING'. Thanks, I will apply this change. > 3. Changes the behavior of the '-n' option to srp_daemon. The earlier behavior printed the initiator extension. The new behavior allows the user to specify an initiator extension as an argument to the '-n' option. I do not think we want to change the -n behavior to this one. First of all your approach induces the same initiator_ext to all the targets discovered by this srp_daemon. Secondly If someone uses random values for the initiator_ext it may cause a waste of resources in the target: the target can never tell when a connection had failed (or an initiator performed a boot) and will keep the connection alive. When there is an attempt to reconnect to this target with the same initiator_ext, the target knows he can close the old connection. This is the reason we decided to have a convention. The convention is to use the target port guid. The advantage of this convention is that it allows us to have two connection on the same time from an initiator to both ports of the target. I understand that some targets may need a different initiator_ext, but you should add a new flag for actually setting the initiator_ext and leave -n untouched. You are welcome to send such a patch. Ishai ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [openfabrics-ewg] OFED 1.1 release schedule
Hi, Let me first explain why the current OFED release does not support SRP-HA on RHEL4. SRP-HA is using Device Mapper multipath. Multipath prerequisites include udev of higher version than 050. RHEL4 distributions includes udev 039. udev is an important part of the distribution and I do not think that users will be ready to upgrade it in order to have SRP-HA. To my best knowledge the main reason that multipath needs at least udev 050 is because it uses the RUN option (This option executes its given parameter after the device exist). Multipath uses the RUN option to execute kpartx that handles the partitions of the new device. SRP-HA also uses the RUN option to execute the multipath command. I have an idea on how to overcome this problem. I want to implement a srp-multipath-daemon. This daemon will get kpartx and multipath requests using a shared message queue. The udev will use the PROGRAM option (That executes its given parameter immediately - before the device exist) to post request to this shared message queue and return immediately. The daemon will wait for the device to create and only than it will execute the commands. In any case this technique will not be a part of the coming OFED release. Ishai -Original Message-From: Sharma, Karun [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 17, 2006 5:11 AMTo: Tziporet Koren; Open FabricsCc: openibSubject: RE: [openfabrics-ewg] OFED 1.1 release schedule The plan is OK with Silverstorm. I have a question though. What are the plans to support SRP-HA feature on RHEL4 kernels ? Thanks Karun ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/SRP set initiator_extention from user space
There is a need for an initiator to connect to the same target several times, e.g., once from each IB port of the target. Some targets do not support multichannel. In order to work with them as well: 1) Use port_guid instead of node_guid. 2) Allow the user to set the identifier_extension when providing the target attributes. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> --- Roland, Madhu and MST, I think this summarizes our discussion. Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-10-03 15:38:16.0 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 18:10:34.0 +0200 @@ -329,25 +329,29 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed -* by 8 bytes of GUID. Older drafts put the two halves in the -* opposite order, so that the GUID comes first. +* by 8 bytes of port GUID. Older drafts put the two halves in the +* opposite order, so that the port GUID comes first. * * Targets conforming to these obsolete drafts can be * recognized by the I/O Class they report. */ + if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id + 8, 8); + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + &target->initiator_ext, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + &target->initiator_ext, 8); + memcpy(req->priv.initiator_port_id + 8, + &target->path.sgid.global.interface_id, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } @@ -1557,6 +1561,7 @@ enum { SRP_OPT_MAX_SECT= 1 << 5, SRP_OPT_MAX_CMD_PER_LUN = 1 << 6, SRP_OPT_IO_CLASS= 1 << 7, + SRP_OPT_INITIATOR_EXT = 1 << 8, SRP_OPT_ALL = (SRP_OPT_ID_EXT | SRP_OPT_IOC_GUID | SRP_OPT_DGID | @@ -1573,6 +1578,7 @@ static match_table_t srp_opt_tokens = { { SRP_OPT_MAX_SECT, "max_sect=%d" }, { SRP_OPT_MAX_CMD_PER_LUN, "max_cmd_per_lun=%d"}, { SRP_OPT_IO_CLASS, "io_class=%x" }, + { SRP_OPT_INITIATOR_EXT,"initiator_ext=%s" }, { SRP_OPT_ERR, NULL} }; @@ -1672,6 +1678,12 @@ static int srp_parse_options(const char target->io_class = token; break; + case SRP_OPT_INITIATOR_EXT: + p = match_strdup(args); + target->initiator_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + default: printk(KERN_WARNING PFX "unknown parameter or missing value " "'%s' in target creation request\n", p); @@ -1820,9 +1832,6 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; - memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); - host->class_dev.class = &srp_class; host->class_dev.dev = device->dev->dma_device; snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h2006-10-03 15:38:16.0 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h 2006-10-03 18:05:50.0 +0200 @@ -91,7 +91,6
[openib-general] [PATCH] IB_CM: Limit the MRA timeout
There is a bug in SRP Engenio target that send a large value as service timeout. (It gets 30 which mean timeout of (2^(30-8))=4195 sec.) Such a long timeout is not reasonable and it may leave the kernel module waiting on wait_for_completion and may stuck a lot of processes. The following patch allows the load of ib_cm module with a limit on the timeout. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> --- Index: last_stable/drivers/infiniband/core/cm.c === --- last_stable.orig/drivers/infiniband/core/cm.c 2006-10-03 15:30:38.0 +0200 +++ last_stable/drivers/infiniband/core/cm.c2006-10-03 15:39:53.0 +0200 @@ -54,6 +54,13 @@ MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("InfiniBand CM"); MODULE_LICENSE("Dual BSD/GPL"); +static int mra_timeout_limit = 0; + +module_param(mra_timeout_limit, int, 0444); +MODULE_PARM_DESC(mra_timeout_limit, + "Limit the MRA timeout according to this value if != 0"); + + static void cm_add_one(struct ib_device *device); static void cm_remove_one(struct ib_device *device); @@ -2297,6 +2304,9 @@ static int cm_mra_handler(struct cm_work timeout = cm_convert_to_ms(cm_mra_get_service_timeout(mra_msg)) + cm_convert_to_ms(cm_id_priv->av.packet_life_time); + if (mra_timeout_limit && timeout > mra_timeout_limit) + timeout = mra_timeout_limit; + spin_lock_irqsave(&cm_id_priv->lock, flags); switch (cm_id_priv->id.state) { case IB_CM_REQ_SENT: -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/SRP: Remove redundant memset of the target
scsi_host_alloc already sets the entire scsi_host (including the privsize to zero) This patch removes the redundant memset. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-10-03 15:41:49.0 +0200 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-10-03 15:55:19.0 +0200 @@ -1731,7 +1731,6 @@ static ssize_t srp_create_target(struct target_host->max_lun = SRP_MAX_LUN; target = host_to_target(target_host); - memset(target, 0, sizeof *target); target->io_class = SRP_REV16A_IB_IO_CLASS; target->scsi_host = target_host; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH] IB/SRP: Enable multichannel
Roland Dreier wrote: > Maybe we should just use the port GUID instead of the node GUID to > form the initiator ID? That would solve this pretty cleanly I think. This is also Vu's idea. There are two issues: 1) My patch allows a sophisticated user to have two logical connections on the same physical solution. He can have different connection parameters (e.g., MAX_CMD_PER_LUN) according to the application needs. Do you think there is such need? 2) In the current implementation there is a problem when there are two connections on the same physical connection - when the second connection sends REQ to the target, the target sends a DREQ to the first connection, but when someone tries to access the first scsi_host, ib_srp tries to reconnect the first connection and then the second connection gets a DREQ - and so the ping pong goes. And if there is a multipath daemon that checks the status of the connections this ping pong can be for ever. We need to find a way to eliminate this behavior. Ishai ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/SRP: allowing multiple connections from taregt to initiator
SRP High Availability should enable an initiator to connect to the same target several times, e.g., once from each IB port of the target. Some targets do not support multichannel. In order to work with them as well we will use another identifier_extension to the initiator port for each target connection. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> --- I think this is the best solution. It allows users to use all four physical connections from the initiator to target. It also allows users to have several logical connections on one physical connection (If they want connection with different attributes - for example different max_cmd_per_lun). It is SRP spec compliant. I also added a module param, so it is possible to turn this option off. Index: latest/drivers/infiniband/ulp/srp/ib_srp.c === --- latest.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-27 10:36:13.0 +0300 +++ latest/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-27 16:48:12.0 +0300 @@ -85,6 +85,13 @@ MODULE_PARM_DESC(mellanox_workarounds, static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 }; +static int variable_identifier_extension = 1; + +module_param(variable_identifier_extension, int, 0444); +MODULE_PARM_DESC(variable_identifier_extension, +"Use another identifier_extension on each connection to target" +", allows multichannel connection on all targets if != 0"); + static void srp_add_one(struct ib_device *device); static void srp_remove_one(struct ib_device *device); static void srp_completion(struct ib_cq *cq, void *target_ptr); @@ -329,6 +336,7 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed @@ -341,13 +349,23 @@ static int srp_send_req(struct srp_targe if (target->io_class == SRP_REV10_IB_IO_CLASS) { memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id + 8, 8); - memcpy(req->priv.initiator_port_id + 8, - target->srp_host->initiator_port_id, 8); + if (variable_identifier_extension) + memcpy(req->priv.initiator_port_id + 8, + &target, sizeof target); + else + memcpy(req->priv.initiator_port_id + 8, + target->srp_host->initiator_port_id, 8); memcpy(req->priv.target_port_id, &target->ioc_guid, 8); memcpy(req->priv.target_port_id + 8, &target->id_ext, 8); } else { - memcpy(req->priv.initiator_port_id, - target->srp_host->initiator_port_id, 16); + if (variable_identifier_extension) + memcpy(req->priv.initiator_port_id, + &target, sizeof target); + else + memcpy(req->priv.initiator_port_id, + target->srp_host->initiator_port_id, 8); + memcpy(req->priv.initiator_port_id + 8, + target->srp_host->initiator_port_id + 8, 8); memcpy(req->priv.target_port_id, &target->id_ext, 8); memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); } @@ -1823,7 +1841,8 @@ static struct srp_host *srp_add_port(str host->dev = device; host->port = port; - host->initiator_port_id[7] = port; + if (!variable_identifier_extension) + host->initiator_port_id[7] = port; memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8); host->class_dev.class = &srp_class; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/SRP: Enable multichannel
Hi Roland, SRP High Availability needs an initiator to connect to the same target several times, e.g., once from each IB port of the target (this way we can use device mapper multipath for failover). Note that both connections are actually active, e.g. multipath is issuing commands to to get the remote scsi id. Since multiple channel operation is currently disabled in connection request, each new connection request will cause the target to disconnect the existing connection which forces us to bounce a lot between the two channels. This patch enables multiple channel operation in connection requests, to avoid getting disconnects when multiple connections are active. There does not seem to be any harm in doing this even when multipath is not used. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> --- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-09-26 09:22:13.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-26 14:54:35.0 +0300 @@ -329,6 +329,7 @@ static int srp_send_req(struct srp_targe req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len); req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | SRP_BUF_FORMAT_INDIRECT); + req->priv.req_flags = SRP_MULTICHAN_MULTI; /* * In the published SRP specification (draft rev. 16a), the * port identifier format is 8 bytes of ID extension followed -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] IB/SRP identify QP in error state
There is a bug in mthca low level driver. A call to ib_post_send that tries to post to a QP that is in error state does not return immediately with error. It terminates with errors after a timeout. This causes SRP to wait a long time to reconnect. (Each abort call and each reset_device call performs post_send and waits for the timeout). The following patch solves this problem by identifying the failure and returning an immediate error code. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> --- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-09-25 13:51:47.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-09-25 15:40:04.0 +0300 @@ -543,6 +543,7 @@ static int srp_reconnect_target(struct s target->tx_head = 0; target->tx_tail = 0; + target->need_reset = 0; ret = srp_connect_target(target); if (ret) goto err; @@ -858,6 +859,7 @@ static void srp_completion(struct ib_cq printk(KERN_ERR PFX "failed %s status %d\n", wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); + target->need_reset = 1; break; } @@ -1313,6 +1315,8 @@ static int srp_abort(struct scsi_cmnd *s printk(KERN_ERR "SRP abort called\n"); + if (target->need_reset) + return FAILED; if (srp_find_req(target, scmnd, &req)) return FAILED; if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK)) @@ -1341,6 +1345,8 @@ static int srp_reset_device(struct scsi_ printk(KERN_ERR "SRP reset_device called\n"); + if (target->need_reset) + return FAILED; if (srp_find_req(target, scmnd, &req)) return FAILED; if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET)) @@ -1750,6 +1756,7 @@ static ssize_t srp_create_target(struct goto err_free; } + target->need_reset = 0; ret = srp_connect_target(target); if (ret) { printk(KERN_ERR PFX "Connection failed\n"); Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h2006-09-25 13:51:47.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h 2006-09-25 14:00:36.0 +0300 @@ -158,6 +158,7 @@ struct srp_target_port { struct completion done; int status; enum srp_target_state state; + int need_reset; }; struct srp_iu { -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] FW: OFED 1.1 rc3 srp driver panic
Here is an Oops in ib_srp From: Dachepalli, Sudhir [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 05, 2006 11:09 PMTo: Vu PhamCc: Richard, BillSubject: OFED 1.1 rc3 srp driver panic Hello Vu, I am trying to integrate MPP and OFED 1.1 rc3 srp. Status on following 2 issues. New Host number allocation for controller offline / online - MPP will handle this with out the need to run hot_add. we need to use srp_daemon. scsi error handler invocation - we need to figure out how to cleanly exit out of error handler after cleaning up all the IO's - THIS IS THE BIGGEST ISSUE NOW. Panic I noticed the following panic when I performed sysreboot on controller A while IO is going on : ib_srp: SRP reset_host calledib_srp: QP event 1ib_srp: QP event 1ib_srp: connection closedUnable to handle kernel NULL pointer dereference at RIP:[<>]PML4 214f0d067 PGD 214657067 PMD 0Oops: 0010 [1] SMPCPU 1Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_srp(U) ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd ehci_hcd ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U) e1000 ext3 jbd mppVhba(U) qla2400 qla2322 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi mptbase ata_piix libata mppUpper(U) sg sd_mod scsi_modPid: 4991, comm: scsi_eh_7 Not tainted 2.6.9-34.ELsmpRIP: 0010:[<>] [<>]RSP: 0018:01021202dd70 EFLAGS: 00010006RAX: 010210234100 RBX: 0102114b0a28 RCX: 0102114b08a0RDX: 0102114b08b0 RSI: 0102114b0a28 RDI: 010210234100RBP: 0102114b0c08 R08: R09: 000210d6c000R10: 0102114b03c8 R11: 0102114b03c8 R12: 0102114b03c8R13: 01021202dee8 R14: 0102114b R15: 01021202ded8FS: 002a9589a760() GS:804d7b80() knlGS:CS: 0010 DS: ES: CR0: 8005003bCR2: CR3: cfe58000 CR4: 06e0Process scsi_eh_7 (pid: 4991, threadinfo 01021202c000, task 010211e9b030)Stack: a02a1792 0102114b08b0 0102114b03c8 a02a18bf 0038 01021202de78 01021202ddb8 010211e9b030 801333c8Call Trace:{:ib_srp:srp_reset_req+37} {:ib_srp:srp_reconnect_target+288} {default_wake_function+0} {kobject_release+0} {:ib_srp:srp_reset_host+51} {:ib_srp:srp_reset_host+59} {:scsi_mod:scsi_try_host_reset+118} {:scsi_mod:scsi_error_handler+2347} {child_rip+8} {:scsi_mod:scsi_error_handler+0} {child_rip+0} Code: Bad RIP value.RIP [<>] RSP <01021202dd70>CR2: <0>Kernel panic - not syncing: Oops Sudhir Dachepalli Engenio Storage Group LSI Logic Corporation 12331 Riata Trace Parkway Suite B200 Austin , Texas 78727 512 794 3706 phone512 794 3702 fax[EMAIL PROTECTED] www.lsilogic.com/engenio ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] A new version for srp daemon
See Below -Original Message- From: Roland Dreier [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 15, 2006 1:48 AM To: Ishai Rabinovitz Cc: openib-general@openib.org; Tziporet Koren Subject: Re: A new version for srp daemon > I put the code in > https://openib.org/svn/trunk/contrib/mellanox/gen2/src/userspace/srptool s/srp_daemon Seems like a bizarre place for it -- a package inside of the srptools package?? [ishai] This is another tool for srp. I thought that we want to put several tools in srptools directory. I will put it in https://openib.org/svn/gen2/trunk/src/userspace/srp_daemon >7) Uses the umad package. This seems like it adds a fairly complicated dependency (since umad depends on something else, etc) for minimal gain. Was it really worth it in terms of your code? I found the umad API more trouble than it was worth for the original srptools. [ishai] You may be right, but this way I can gain from future improvements in the umad package (I'm optimistic) - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] A new version for srp daemon
Adding a subject From: Ishai Rabinovitz Sent: Tuesday, August 15, 2006 12:32 AMTo: 'Roland Dreier'Cc: 'openib-general@openib.org'; Tziporet KorenSubject: Hi Roland, In order to support High-Availability in OFED 1.1, we need more functionality to the srp daemon. Based on your code I implemented a new srp daemon (I listed its new features below) . I put the code in https://openib.org/svn/trunk/contrib/mellanox/gen2/src/userspace/srptools/srp_daemon and I'm in an initial testing stage (there are some known bugs). Since I think that people may still want to use your original ibsrpdm, I think we should keep your version and start a new tool from my code. What do you think? Ishai The new tool main features: 1) Register to Traps 64 and 144. 2) Can ask ib_srp to connect to the targets it finds. 3) Can check if the target is already connected by ib_srp from the same port. 4) Can perform rescan of the fabric every X seconds. 5) Identify SA changes and other events and act accordingly. 6) Can get an hca name and a port number as input (not only a umad device). 7) Uses the umad package. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] (no subject)
Hi Roland, In order to support High-Availability in OFED 1.1, we need more functionality to the srp daemon. Based on your code I implemented a new srp daemon (I listed its new features below) . I put the code in https://openib.org/svn/trunk/contrib/mellanox/gen2/src/userspace/srptools/srp_daemon and I'm in an initial testing stage (there are some known bugs). Since I think that people may still want to use your original ibsrpdm, I think we should keep your version and start a new tool from my code. What do you think? Ishai The new tool main features: 1) Register to Traps 64 and 144. 2) Can ask ib_srp to connect to the targets it finds. 3) Can check if the target is already connected by ib_srp from the same port. 4) Can perform rescan of the fabric every X seconds. 5) Identify SA changes and other events and act accordingly. 6) Can get an hca name and a port number as input (not only a umad device). 7) Uses the umad package. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [SRP] [RFC] Needed changes to support fail-over drivers
On Tue, Jul 25, 2006 at 04:58:48PM +0300, [EMAIL PROTECTED] wrote: > [CC'ing linux-scsi as well -- I think we'll get better insight from = > there] > > OK, but is this a valid assumption? What happens for iSCSI and/or iSER? >From Mike's response I understand that it is a reasonable behavior to keep the host (at least for a period of time) and let the userspace daemon be responsible to the reconnection or deallocating of that host. > > How does the daemon know when something is gone for good vs. when it > might come back? > I think we should use a time out in the daemon. > > Why does userspace need to be able to disconnect a connection? > There are two options on who will initiate the disconnection: the userspace daemon or the ib_srp module. I considered both options and I was not sure which one is better. I choose to do it in userspace because it looks a good symmetry that both the disconnection and reconnection will be initiate in the same place. I will accept your comment and change it to the kernel. > > Why the asymmetry here? In other words, why does anything work for > reconnect_target but only the literal "erase" work for erase_target? > Because erase_target is a destructive command that can not be reversed I think it should use a more safe approach. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [SRP] [RFC] Needed changes to support fail-over drivers
Hi, The current SRP initiator code cannot work with several fail-over mechanisms. The current srp driver's behavior when a target off-line then online: 1) The target is offline. 2) the initiator tries to reconnect and fails 3) The initiator calls srp_remove_work that removes the scsi_host. 4) The target is back online. 5) the user (or the ibsrpdm daemon) is expected to execute a new add_target. 6) This creates a new scsi_host (with new names to the devices and new index in the scsi_host directory in sysfs) for this target. Fail-over drivers (e.g., MPP that is used by Engenio and XVM that is used by SGI) have problems with this behavior (item 3). They need the scsi_host to keep exist and return errors in the meanwhile until the connection to the target resumes. In addition remove/re-alloc scsi host is a "heavy" operation instead of disconnect/reconnect the connection only. In order to support these tools I propose the following changes that will allow the user to move the srp initiator to a disconnected state (when the target leaves the fabric) and reconnect it later (when the target returns to the fabric). After these changes will be in the ib_srp module, the ibsrpdm daemon will be able to monitor the presence of targets in the fabric and to use this interface (When targets leave or rejoin the fabric). Here is the description of the new design: (I already implemented most of the code) 1) Split the function srp_reconnect_target into two functions: _srp_disconnect_target and _srp_reconnect_target 2) Adding two new states: SRP_TARGET_DISCONNECTED (The state after _srp_disconnect_target was executed and before _srp_reconnect_target is executed) and SRP_TARGET_DISCONNECTING (The state while in srp_remove_target). 3) Adding new input files in sysfs: /sys/class/scsi_host/host?/{disconnect_target,connect_target,erase_target} 4) Writing the string "remove" to /sys/class/scsi_host/host?/disconnect_target calls srp_disconnect_target that moves the corresponding target to a SRP_TARGET_DISCONNECTED state (After closing the cm, and reset all pending requests). Now when the scsi performs queuecommand to this host the result is DID_NO_CONNECT. This causes the scsi mid-layer to return to the user with an IO error without initiating the scsi error auto recovery chain. 5) Writing anything to /sys/class/scsi_host/host?/reconnect_target calls _srp_reconnect_target that move the target to SRP_TARGET_LIVE state again. 6) Writing "erase" to /sys/class/scsi_host/host?/erase_target calls srp_remove_work that removes the scsi_host. 7) Adding output files in sysfs to present the HCA and port that the initiator used to connect to the target. Using these files and the target GUID the ibsrpdm can know on which scsi_host to perform the reconnect_target. Please comment. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] SRP: Avoid a potential race on target->req_queue
Hi Roland, There is a potential race between srp_reconnect_target and srp_reset_device when they access the target->req_queue. These functions can execute in the same time because srp_reconnect_target is called form srp_reconnect_work that is scheduled by srp_completion, while srp_reset_device is called from the scsi layer. The race is caused because srp_reconnect_target is not holding host_lock while accessing target->req_queue. It assumes that since the state is CONNECTING no other function will access target->req_queue (and this is the case with srp_reset_host for example). There are two possible solutions: 1) Change srp_reset_device: after locking host_lock, it will check the state. Only if the state is LIVE it will execute the loop that access target->req_queue. 2) Change srp_reconnect_target. Before executing the loop that access target->req_queue it will lock host_lock and will release it after the loop. I'm sending a patch for the second solution. If you prefer the first, I have another patch for it (It is a bit longer). Which solution do you like better? Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-06-13 02:24:22.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-13 02:26:07.0 +0300 @@ -641,8 +641,10 @@ static int srp_reconnect_target(struct s while (ib_poll_cq(target->cq, 1, &wc) > 0) ; /* nothing */ + spin_lock_irq(target->scsi_host->host_lock); list_for_each_entry_safe(req, tmp, &target->req_queue, list) srp_reset_req(target, req); + spin_unlock_irq(target->scsi_host->host_lock); target->rx_head = 0; target->tx_head = 0; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: SRP [PATCH 0/4] Kernel support for removal and restoration of target
The idea is that the daemon will notice targets that leave the fabric (for a short time), and will activate remove_target. When the target will return to the fabric, the daemon will activate restore_target. This will make sure that the scsi_host won't go offline (From where there is no return) I'm waiting for suggestion about the mechanism that will be responsible to remove the scsi_host when the target does not return to the fabric after a while. (See my previous mail for details) On Tue, Jun 06, 2006 at 03:11:46PM -0700, Roland Dreier wrote: > I haven't read too deeply yet, but something that would help me > understand the overall plan here would be an explanation of how one > would use the restore_target function. Why would I want to disconnect > from a target but keep the kernel's SCSI device hanging around? > > - R. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH 4/4] show_srp_state
Add query for srp_state in sysfs. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-31 18:52:14.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-04 14:21:52.0 +0300 @@ -1362,6 +1362,26 @@ static int srp_reset_host(struct scsi_cm return ret; } +static ssize_t show_srp_state(struct class_device *cdev, char *buf) +{ + struct srp_target_port *target = host_to_target(class_to_shost(cdev)); + enum srp_target_state target_state = target->state; + + static const char *state_name[] = { + [SRP_TARGET_LIVE] = "LIVE", + [SRP_TARGET_CONNECTING] = "CONNECTING", + [SRP_TARGET_DISCONNECTING] = "DISCONNECTING", + [SRP_TARGET_DISCONNECTED] = "DISCONNECTED", + [SRP_TARGET_DEAD] = "DEAD", + [SRP_TARGET_REMOVED]= "REMOVED", + }; + + if (target_state >= 0 && target_state < ARRAY_SIZE(state_name)) + return sprintf(buf, "%s\n", state_name[target_state]); + + return sprintf(buf, "UNKNOWN\n"); +} + static ssize_t show_id_ext(struct class_device *cdev, char *buf) { struct srp_target_port *target = host_to_target(class_to_shost(cdev)); @@ -1439,6 +1459,7 @@ static ssize_t show_zero_req_lim(struct static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); static CLASS_DEVICE_ATTR(restore_target, S_IWUSR, NULL, srp_restore_target); +static CLASS_DEVICE_ATTR(srp_state, S_IRUGO, show_srp_state, NULL); static CLASS_DEVICE_ATTR(id_ext,S_IRUGO, show_id_ext, NULL); static CLASS_DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid,NULL); static CLASS_DEVICE_ATTR(service_id,S_IRUGO, show_service_id, NULL); @@ -1447,6 +1468,7 @@ static CLASS_DEVICE_ATTR(dgid,S_IRUGO, static struct class_device_attribute *srp_host_attrs[] = { &class_device_attr_remove_target, &class_device_attr_restore_target, + &class_device_attr_srp_state, &class_device_attr_id_ext, &class_device_attr_ioc_guid, &class_device_attr_service_id, -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH 3/4] restore target
Add support to restore_target from sysfs. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-06-04 10:01:50.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-04 10:02:27.0 +0300 @@ -1551,16 +1551,21 @@ static ssize_t show_zero_req_lim(struct static ssize_t srp_remove_target(struct class_device *cdev, const char *buf, size_t count); -static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); -static CLASS_DEVICE_ATTR(id_ext, S_IRUGO, show_id_ext, NULL); -static CLASS_DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid, NULL); -static CLASS_DEVICE_ATTR(service_id, S_IRUGO, show_service_id, NULL); -static CLASS_DEVICE_ATTR(pkey, S_IRUGO, show_pkey, NULL); -static CLASS_DEVICE_ATTR(dgid, S_IRUGO, show_dgid, NULL); -static CLASS_DEVICE_ATTR(zero_req_lim, S_IRUGO, show_zero_req_lim, NULL); +static ssize_t srp_restore_target(struct class_device *cdev, + const char *buf, size_t count); + +static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); +static CLASS_DEVICE_ATTR(restore_target, S_IWUSR, NULL, srp_restore_target); +static CLASS_DEVICE_ATTR(id_ext,S_IRUGO, show_id_ext, NULL); +static CLASS_DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid,NULL); +static CLASS_DEVICE_ATTR(service_id,S_IRUGO, show_service_id, NULL); +static CLASS_DEVICE_ATTR(pkey, S_IRUGO, show_pkey,NULL); +static CLASS_DEVICE_ATTR(dgid, S_IRUGO, show_dgid,NULL); +static CLASS_DEVICE_ATTR(zero_req_lim, S_IRUGO, show_zero_req_lim,NULL); static struct class_device_attribute *srp_host_attrs[] = { &class_device_attr_remove_target, + &class_device_attr_restore_target, &class_device_attr_id_ext, &class_device_attr_ioc_guid, &class_device_attr_service_id, @@ -1861,6 +1866,17 @@ static ssize_t srp_remove_target(struct return count; } +static ssize_t srp_restore_target(struct class_device *cdev, +const char *buf, size_t count) +{ + int ret = _srp_restore_target(host_to_target(class_to_shost(cdev))); + + if (ret) + return ret; + + return count; +} + static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { struct srp_host *host = -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH 2/4] remove target
Add support to remove_target from sysfs. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-06-05 16:46:55.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-05 17:11:11.0 +0300 @@ -1516,6 +1516,10 @@ return sprintf(buf, "%d\n", target->zero_req_lim); } +static ssize_t srp_remove_target(struct class_device *cdev, +const char *buf, size_t count); + +static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); static CLASS_DEVICE_ATTR(id_ext, S_IRUGO, show_id_ext, NULL); static CLASS_DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid, NULL); static CLASS_DEVICE_ATTR(service_id, S_IRUGO, show_service_id, NULL); @@ -1524,6 +1528,7 @@ static CLASS_DEVICE_ATTR(zero_req_lim, S_IRUGO, show_zero_req_lim, NULL); static struct class_device_attribute *srp_host_attrs[] = { + &class_device_attr_remove_target, &class_device_attr_id_ext, &class_device_attr_ioc_guid, &class_device_attr_service_id, @@ -1814,6 +1819,23 @@ static CLASS_DEVICE_ATTR(add_target, S_IWUSR, NULL, srp_create_target); +static ssize_t srp_remove_target(struct class_device *cdev, +const char *buf, size_t count) +{ + int ret; + const char const remove_str[] = "remove"; + + if (strncmp(buf, "remove", sizeof(remove_str))) + return -EINVAL; + + ret = _srp_remove_target(host_to_target(class_to_shost(cdev))); + + if (ret) + return ret; + + return count; +} + static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { struct srp_host *host = -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH 1/4] split srp_reconnect_target
Split the srp_reconnect_target to two functions _srp_remove_target and _srp_restore_target. These functions will be used later in patch series also to allow removal and restoration of a target from the sysfs. I made some changes in order to support this: 1) There are two new states: SRP_TARGET_DISCONNECTED - The state after _srp_remove_target was successfully executed and before _srp_restore_target is executed. SRP_TARGET_DISCONNECTING - The state while _srp_remove_target is executed. SRP_TARGET_CONNECTING is now the state while _srp_restore_target is executed. 2) The value of target->cm_id can be NULL. This happens after _srp_remove_target destroyed the old cm_id and before _srp_restore_target created the new cm_id. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-06-04 10:03:25.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-06-04 10:54:26.0 +0300 @@ -40,6 +40,7 @@ #include #include #include +#include #include @@ -373,7 +374,8 @@ static void srp_remove_work(void *target spin_unlock(&target->srp_host->target_lock); scsi_remove_host(target->scsi_host); - ib_destroy_cm_id(target->cm_id); + if (target->cm_id) + ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); scsi_host_put(target->scsi_host); } @@ -464,20 +466,57 @@ static void srp_reset_req(struct srp_tar srp_remove_req(target, req); } -static int srp_reconnect_target(struct srp_target_port *target) +static void srp_remove_target_port(struct srp_target_port *target) +{ + /* +* Kill our target port off. +* However, we have to defer the real removal because we might +* be in the context of the SCSI error handler now, which +* would deadlock if we call scsi_remove_host(). +*/ + spin_lock_irq(target->scsi_host->host_lock); + if (target->state != SRP_TARGET_REMOVED) { + target->state = SRP_TARGET_DEAD; + INIT_WORK(&target->work, srp_remove_work, target); + schedule_work(&target->work); + } + spin_unlock_irq(target->scsi_host->host_lock); +} + +static int _srp_remove_target(struct srp_target_port *target) { - struct ib_cm_id *new_cm_id; struct ib_qp_attr qp_attr; struct srp_request *req, *tmp; struct ib_wc wc; - int ret; + int ret = 0; spin_lock_irq(target->scsi_host->host_lock); - if (target->state != SRP_TARGET_LIVE) { + switch (target->state) { + case SRP_TARGET_REMOVED: + case SRP_TARGET_DEAD: + ret = -ENOENT; + break; + + case SRP_TARGET_DISCONNECTING: + case SRP_TARGET_CONNECTING: + ret = -EAGAIN; /* So that the caller will try again later - + after the connection ends one way or another */ + break; + + case SRP_TARGET_DISCONNECTED: + ret = -ENOTCONN; + break; + + case SRP_TARGET_LIVE: + break; + } + + if (ret) { spin_unlock_irq(target->scsi_host->host_lock); - return -EAGAIN; + return ret; } - target->state = SRP_TARGET_CONNECTING; + + target->state = SRP_TARGET_DISCONNECTING; spin_unlock_irq(target->scsi_host->host_lock); srp_disconnect_target(target); @@ -485,24 +525,14 @@ static int srp_reconnect_target(struct s * Now get a new local CM ID so that we avoid confusing the * target in case things are really fouled up. */ - new_cm_id = ib_create_cm_id(target->srp_host->dev->dev, - srp_cm_handler, target); - if (IS_ERR(new_cm_id)) { - ret = PTR_ERR(new_cm_id); - goto err; - } ib_destroy_cm_id(target->cm_id); - target->cm_id = new_cm_id; + target->cm_id = NULL; qp_attr.qp_state = IB_QPS_RESET; ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); if (ret) goto err; - ret = srp_init_qp(target, target->qp); - if (ret) - goto err; - while (ib_poll_cq(target->cq, 1, &wc) > 0) ; /* nothing */ @@ -513,6 +543,49 @@ static int srp_reconnect_target(struct s target->tx_head = 0; target->tx_tail = 0; + spin_lock_irq(target->scsi_host->host_lock); + if (target->state == SRP_TARGET_DISCONNECTING) { + ret = 0; + target->state = SRP_TARGET_DISCONNECTED; +
[openib-general] SRP [PATCH 0/4] Kernel support for removal and restoration of target
Hi Roland, I'm sending 4 patches that implement kernel support for removal and restoration of a target (will be used by ibsrpdm). Some comments about them 1) The first patch splits reconnect to 2 two functions: _srp_remove_target and _srp_restore_target. _srp_remove_target uses the functions I sent in previous patch (Misc cleanups in ib_srp). If you want I can resend this patch without using the previous patch (But then there will be a problem with the previous patch :( ). 2) These patches implement the following behavior: When someone writes the string "remove" to /sys/class/scsi_host/host?/remove_target the corresponding target goes to a DISCONNECTED state (After closing the cm, and reset all pending requests). Now when the scsi performs queuecommand to this host a SCSI_MLQUEUE_HOST_BUSY is returned. These causes the scsi layer to wait until the target turns to LIVE state. This is very nice if the user that initiated the remove_target knows what he is doing and will perform a restore_target later. On the other hand it may be problematic if the target remains DISCONNECTED and user applications that try to access this target remain stuck in the kernel (in the scsi layer) I've several ideas on how to handle it (have a timeout after which queuecommand will return fail, try to perform a restore_target after a timeout, make sure the daemon will run a restore_target after a timeout) but I'm not sure they are the correct thing to do. I'm waiting for suggestions. In any case I believe we should apply these patches and add solution to this problem later. Please comment. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP: [PATCH] Misc cleanups in ib_srp
Hi, Misc cleanups in ib_srp. Please consider for 2.6.18. 1) I think that it is more efficient to move the req entries from req_list to free_list in srp_reconnect_target (rather than rebuild the free_list). (In any case this code is shorter). 2) This allows us to reuse code in srp_reset_device and srp_reconnect_target and call a new function srp_reset_req. 3) We can use list_move_tail in srp_remove_req. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-19 11:14:35.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-21 17:41:25.0 +0300 @@ -451,14 +451,26 @@ static void srp_unmap_data(struct scsi_c scmnd->sc_data_direction); } +static void srp_remove_req(struct srp_target_port *target, struct srp_request *req) +{ + srp_unmap_data(req->scmnd, target, req); + list_move_tail(&req->list, &target->free_reqs); +} + +static void srp_reset_req(struct srp_target_port *target, struct srp_request *req) +{ + req->scmnd->result = DID_RESET << 16; + req->scmnd->scsi_done(req->scmnd); + srp_remove_req(target, req); +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_cm_id *new_cm_id; struct ib_qp_attr qp_attr; - struct srp_request *req; + struct srp_request *req, *tmp; struct ib_wc wc; int ret; - int i; spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_LIVE) { @@ -494,19 +506,12 @@ static int srp_reconnect_target(struct s while (ib_poll_cq(target->cq, 1, &wc) > 0) ; /* nothing */ - list_for_each_entry(req, &target->req_queue, list) { - req->scmnd->result = DID_RESET << 16; - req->scmnd->scsi_done(req->scmnd); - srp_unmap_data(req->scmnd, target, req); - } + list_for_each_entry_safe(req, tmp, &target->req_queue, list) + srp_reset_req(target, req); target->rx_head = 0; target->tx_head = 0; target->tx_tail = 0; - INIT_LIST_HEAD(&target->free_reqs); - INIT_LIST_HEAD(&target->req_queue); - for (i = 0; i < SRP_SQ_SIZE; ++i) - list_add_tail(&target->req_ring[i].list, &target->free_reqs); ret = srp_connect_target(target); if (ret) @@ -706,13 +711,6 @@ static int srp_map_data(struct scsi_cmnd return len; } -static void srp_remove_req(struct srp_target_port *target, struct srp_request *req) -{ - srp_unmap_data(req->scmnd, target, req); - list_del(&req->list); - list_add_tail(&req->list, &target->free_reqs); -} - static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) { struct srp_request *req; @@ -1349,11 +1347,8 @@ static int srp_reset_device(struct scsi_ spin_lock_irq(target->scsi_host->host_lock); list_for_each_entry_safe(req, tmp, &target->req_queue, list) - if (req->scmnd->device == scmnd->device) { - req->scmnd->result = DID_RESET << 16; - req->scmnd->scsi_done(req->scmnd); - srp_remove_req(target, req); - } + if (req->scmnd->device == scmnd->device) + srp_reset_req(target, req); spin_unlock_irq(target->scsi_host->host_lock); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] ibsrpdm - allocate agent
The array agent is allocated with no entries. When agent is being accessed in create_agent there is a memory corruption. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-05-28 14:24:25.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-05-28 14:55:41.0 +0300 @@ -536,7 +536,7 @@ static int get_port_list(int fd, uint32_ int main(int argc, char *argv[]) { int fd; - uint32_tagent[0]; + uint32_tagent[2]; char *cmd_name = strdup(argv[0]); while (1) { -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Get email notification to svn commit
Hi, I understand that there is a way to be informed by mail when there is a commit to the svn repository. How can I register? Can I register to get notification only on svn commit to specific directories? Thanks -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote: > > + /* > > + * We need 2 scsi_host_put becuase there are two get: > > + * in scsi_host_alloc and in scsi_add_host > > + */ > > + scsi_host_put(target->scsi_host); > >scsi_host_put(target->scsi_host); > > Hmm, this doesn't seem right to me. If I try this, then I get a crash > because the scsi_host is already gone after the first put. I verified > that the reference count is 1 before these puts, and with the > unmodified module I don't see anything left in /sys/class/scsi_host > after unloading the module. > > What kernel are you seeing problems with? I'm testing with an > up-to-date git kernel, although I doubt it makes a difference (did > SCSI reference counting change recently??). > > I do think there are some extra scsi_host_put() calls in > srp_remove_work() -- I think the double scsi_host_put() dates back to > a version (which I may never even have checked in) where there was a > scsi_host_get() to avoid the scsi_host going away between the > schedule_work() and srp_remove_work() actually running. > > So the patch below seems correct to me. > > What do you think? > > --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) > +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) > @@ -353,7 +356,6 @@ static void srp_remove_work(void *target > spin_lock_irq(target->scsi_host->host_lock); > if (target->state != SRP_TARGET_DEAD) { > spin_unlock_irq(target->scsi_host->host_lock); > - scsi_host_put(target->scsi_host); > return; > } > target->state = SRP_TARGET_REMOVED; > @@ -367,8 +369,6 @@ static void srp_remove_work(void *target > ib_destroy_cm_id(target->cm_id); > srp_free_target_ib(target); > scsi_host_put(target->scsi_host); > - /* And another put to really free the target port... */ > - scsi_host_put(target->scsi_host); > } > > static int srp_connect_target(struct srp_target_port *target) > > Roland, As I told you before, your patch looks correct. Are you going to apply it? -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Re: Fwd: [Bug 91] sizeof(srp_indirect_buf) wrongon 64-bit platforms
On Tue, May 23, 2006 at 10:10:05AM +0300, Arne Redlich wrote: > Roland Dreier <[EMAIL PROTECTED]> writes: > > > > Roland, maybe this means we need scsi/srp.h in svn for now? > > > svn is supposed to work on 2.6.16 ... > > > > As far as I can tell the bug in the header has no effect on how the IB > > SRP initiator works. > > Roland, > > I'm afraid it *does* have an effect, unfortunately. There's the following > code in ib_srp.c::srp_map_data(), around the lines 540 - 550: > > struct srp_indirect_buf *buf = (void *) cmd->add_data; > > /* snip */ > > buf->table_desc.va = cpu_to_be64(req->cmd->dma + > sizeof *cmd + > sizeof *buf); > > So if a target actually RDMA Reads the indirect descriptor table, it will use > a wrong address. > It looks to me that there is no effect after all. This buf->table_desc.va should point to the desc_list array in the srp_indirect_buf. When the code enters the values to this array (buf->desc_list[i]) it uses the address that is corresponding to sizeof *buf. To sum it up, there will be a change in the address the target sees but the data will still be in the address the target sees. > Arne > -- > Arne Redlich > Xiranet Communications GmbH > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Handling DREQ
On Fri, May 19, 2006 at 02:54:35AM +0300, Ishai Rabinovitz wrote: Hi Fixed the patch a little bit. > Hi, > > I got "Unhandled CM event 6" (IB_CM_DREQ_ERROR) and "Unhandled CM event 7" > (IB_CM_DREQ_RECEIVED). > > So here is a patch that handles these CM events. > > This is an initial patch. Maybe it will be more efficient to initiate a > reconnect > in case we get IB_CM_DREQ_RECEIVED. What do you think? > > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-21 18:19:58.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-22 09:32:00.0 +0300 @@ -1217,6 +1217,20 @@ static int srp_cm_handler(struct ib_cm_i target->status = 0; break; + case IB_CM_DREQ_ERROR: + printk(KERN_ERR PFX + "IB_CM_DREQ_ERROR received - connection closed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + + case IB_CM_DREQ_RECEIVED: + printk(KERN_WARNING PFX + "IB_CM_DREQ_RECEIVED received - connection closed\n"); + if (ib_send_cm_drep(cm_id, NULL, 0)) + printk(KERN_ERR PFX "ib_send_cm_drep failed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + default: printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); break; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP: [PATCH] Handling DREQ
Hi, I got "Unhandled CM event 6" (IB_CM_DREQ_ERROR) and "Unhandled CM event 7" (IB_CM_DREQ_RECEIVED). So here is a patch that handles these CM events. This is an initial patch. Maybe it will be more efficient to initiate a reconnect in case we get IB_CM_DREQ_RECEIVED. What do you think? Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-19 00:05:30.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-19 01:35:37.0 +0300 @@ -1214,13 +1231,29 @@ static int srp_cm_handler(struct ib_cm_i target->status = 0; break; + case IB_CM_DREQ_ERROR: + printk(KERN_ERR PFX + "IB_CM_DREQ_ERROR received - connection closed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + + case IB_CM_DREQ_RECEIVED: + printk(KERN_ERR PFX + "IB_CM_DREQ_RECEIVED received - connection closed\n"); + if (ib_send_cm_drep(target->cm_id, NULL, 0)) + printk(KERN_ERR PFX "ib_send_cm_drep failed\n"); + /* no need to set comp - there will be a TIMEWAIT_EXIT */ + break; + default: printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); break; } - if (comp) + if (comp) { + printk(KERN_ERR PFX "srp_cm_handler: complete to %p\n", target); complete(&target->done); + } kfree(qp_attr); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH] Cleaning in srp_remove_one
3 changes in the same place: 1) The if statement is redundant. 2) There is no need to save the flags - it is inside a mutex_lock. 3) We hold the mutex for the list and we are not deleting from the list so there is no need for list_for_each_entry_safe. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-14 14:22:12.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 14:26:54.0 +0300 @@ -1750,7 +1750,6 @@ static void srp_remove_one(struct ib_dev struct srp_host *host, *tmp_host; LIST_HEAD(target_list); struct srp_target_port *target, *tmp_target; - unsigned long flags; dev_list = ib_get_client_data(device, &srp_client); @@ -1767,12 +1766,10 @@ static void srp_remove_one(struct ib_dev * commands and don't try to reconnect. */ mutex_lock(&host->target_mutex); - list_for_each_entry_safe(target, tmp_target, -&host->target_list, list) { - spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; - spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + list_for_each_entry(target, &host->target_list, list) { + spin_lock_irq(target->scsi_host->host_lock); + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); } mutex_unlock(&host->target_mutex); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP [PATCH] Looks like a potantial bug
On Wed, May 17, 2006 at 06:40:04PM +0300, Ishai Rabinovitz wrote: > Hi, > > While doing a code review I found a potential bug. > I did not manage to execute a test to check this code. > Please take a look: Sorry, I made a mistake in the patch. Please look at this one. In srp_reconnect_target it uses req->scmnd->scsi_done(req->scmnd); (like in the patch) Ishai > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > -- > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > === > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 > 16:24:24.0 +0300 > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 > 17:13:47.0 +0300 > @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ > list_for_each_entry_safe(req, tmp, &target->req_queue, list) > if (req->scmnd->device == scmnd->device) { > req->scmnd->result = DID_RESET << 16; > - scmnd->scsi_done(scmnd); > + req->scmnd->scsi_done(req->scmnd); > srp_remove_req(target, req); > } > > -- > Ishai Rabinovitz > ___ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP [PATCH] Looks like a potantial bug
Hi, While doing a code review I found a potential bug. I did not manage to execute a test to check this code. Please take a look: Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> -- Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-17 16:24:24.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-17 17:13:47.0 +0300 @@ -1326,7 +1326,7 @@ static int srp_reset_device(struct scsi_ list_for_each_entry_safe(req, tmp, &target->req_queue, list) if (req->scmnd->device == scmnd->device) { req->scmnd->result = DID_RESET << 16; - scmnd->scsi_done(scmnd); + req->scmnd->scsi_done(scmnd); srp_remove_req(target, req); } -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Wed, May 17, 2006 at 02:55:57AM +0300, Roland Dreier wrote: > Hmm, this doesn't seem right to me. If I try this, then I get a crash > because the scsi_host is already gone after the first put. I verified > that the reference count is 1 before these puts, and with the > unmodified module I don't see anything left in /sys/class/scsi_host > after unloading the module. > > What kernel are you seeing problems with? I'm testing with an > up-to-date git kernel, although I doubt it makes a difference (did > SCSI reference counting change recently??). > > I do think there are some extra scsi_host_put() calls in > srp_remove_work() -- I think the double scsi_host_put() dates back to > a version (which I may never even have checked in) where there was a > scsi_host_get() to avoid the scsi_host going away between the > schedule_work() and srp_remove_work() actually running. > > So the patch below seems correct to me. > > What do you think? I could not reproduce the problem again, so this patch works for me. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Wed, May 17, 2006 at 02:56:58AM +0300, Roland Dreier wrote: > BTW, I think the patch below is correct as well. This avoids problems > where the SRP driver waits forever for a completion, for example if > sending the DREQ fails because the connection has already been > disconnected by the target. > > Does this scenario seem like the deadlock you thought you saw? > > --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 7245) > +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) > @@ -342,7 +342,10 @@ static void srp_disconnect_target(struct > /* XXX should send SRP_I_LOGOUT request */ > > init_completion(&target->done); > - ib_send_cm_dreq(target->cm_id, NULL, 0); > + if (ib_send_cm_dreq(target->cm_id, NULL, 0)) { > + printk(KERN_DEBUG PFX "Sending CM DREQ failed\n"); > + return; > + } > wait_for_completion(&target->done); > } > I don't think this caused the deadlock I had. Still it looks like an important patch. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
On Sun, May 14, 2006 at 02:02:01PM +0300, Ishai Rabinovitz wrote: > Hi, > > After loading ib_srp module, adding a target and then unloading the ib_srp > target the scsi_host directory in /sys/class/scsi_host/ still exists. > It looks like the srp code does not release the scsi_host it had allocated. > > After examining the code I found out that when executing srp_remove_work > (the removal of one target) scsi_host_put is called twice, but when unloading > the module in srp_remove_one scsi_host_put is called only once. > > It looks like the correct thing is to execute scsi_host_put twice (once for > the call to scsi_add_host and once for the call to scsi_host_alloc). > > So, I suggest the next patch: Forgot to sign. Here it is again with the Signed-off-by line. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-14 13:09:23.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:25:48.0 +0300 @@ -357,7 +357,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -1790,6 +1789,11 @@ static void srp_remove_one(struct ib_dev srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + /* +* We need 2 scsi_host_put becuase there are two get: +* in scsi_host_alloc and in scsi_add_host +*/ + scsi_host_put(target->scsi_host); scsi_host_put(target->scsi_host); } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP: [PATCH] Releasing the scsi_host when unloading
Hi, After loading ib_srp module, adding a target and then unloading the ib_srp target the scsi_host directory in /sys/class/scsi_host/ still exists. It looks like the srp code does not release the scsi_host it had allocated. After examining the code I found out that when executing srp_remove_work (the removal of one target) scsi_host_put is called twice, but when unloading the module in srp_remove_one scsi_host_put is called only once. It looks like the correct thing is to execute scsi_host_put twice (once for the call to scsi_add_host and once for the call to scsi_host_alloc). So, I suggest the next patch: Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-14 13:09:23.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-14 13:25:48.0 +0300 @@ -357,7 +357,6 @@ static void srp_remove_work(void *target spin_lock_irq(target->scsi_host->host_lock); if (target->state != SRP_TARGET_DEAD) { spin_unlock_irq(target->scsi_host->host_lock); - scsi_host_put(target->scsi_host); return; } target->state = SRP_TARGET_REMOVED; @@ -1790,6 +1789,11 @@ static void srp_remove_one(struct ib_dev srp_disconnect_target(target); ib_destroy_cm_id(target->cm_id); srp_free_target_ib(target); + /* +* We need 2 scsi_host_put becuase there are two get: +* in scsi_host_alloc and in scsi_add_host +*/ + scsi_host_put(target->scsi_host); scsi_host_put(target->scsi_host); } ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [openfabrics-ewg] SRP: changes to ibsrpdm
On Thu, May 04, 2006 at 11:05:51PM +0300, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier <[EMAIL PROTECTED]>: > > Why can't the daemon keep > > track of which targets it has added and not add a target twice? Is > > there some reason that the kernel has to be involved? > > What if the daemon is killed/restarted? > > -- > MST What if there are two daemons by accident? -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] SRP: changes to ibsrpdm
Hi, After implementing and submitting the ibsrpdm patches I got the following important remarks: 1) Sometimes there is a need to add a target twice. 2) It it unnecessary complication to the kernel to look for a target in the targets list in order to remove it. 3) There is a conceptual problem in the list_target query - In sysfs each file should report only one value. Before implementing the fixes according to this remarks, I want to see if there are any comments to the changes I'm going to do. I'm going to change ibsrpdm and SRP driver code in the following manner: 1) There is going to be an attribute for a target indicating if it was added by the daemon (Named daemons). Only the daemon should add targets with this attribute set. 2) The kernel will not allow the daemon to add the same target twice. Regular activation of add_target can add multiple instances of the same target. 3) list_target query will be removed. The information will be in several directories in sysfs (One for each target) - Roland, vu, can you send me a pointer that explains which target information can be found in the sysfs today? 4) I'll change the way I've implemented the activation of remove target. There will be a remove target file for each existing target directory in sysfs, echo 1 to this file will remove the corresponding target. 5) The daemon will remove only targets that it added. (Has the daemons attribute set). 6) Adding execution modes to ibsrpdm: a) When activated without flags or with -c flag, ibsrpdm will executes once (has before) and display the targets in the network. b) When activated with -l flag, ibsrpdm will be activated in a loop and display at each cycle the targets that join the network and the targets the leaves the network. c) When activated with -l and -a flags, ibsrpdm will be activated as a daemon that adds targets that join the network. d) When activated with -l and -r flags, ibsrpdm will be activated as a daemon that removes targets that leave the network. e) When activated with -l, -a and -r flags, ibsrpdm will be activated as a daemon that adds targets that join the network, and removes targets that leave the network. The reason we need option b, is because costumers may want to add the targets in a certain order (for binding purposes). Any comments? Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [openfabrics-ewg] Current OFED kernel snapshot - problems in back porting SRP to RH4
On Tue, May 02, 2006 at 10:34:29AM +0300, Aviram Gutman wrote: > > Yes, we have uploaded a pre version of RC4. You can find it in: > > URL: https://openib.org/svn/gen2/branches/1.0/ofed/releases > > OFED-1.0-rc4-pre4.tgz > OFED-1.0-rc4-pre4.tgz.md5sum > > Please pay attention that we still face issues. The following is the diff > between RC3 and the pre: > > 1. Bug fixes according to problems reported. > > 2. SRP - with new features: FMR, tunable parameters, SRP daemon - We have > an issue with the back port of SRP to RH4 U2 and U3. Ishai will issue a mail > with explanation. > > 3. Open MPI - new package based on 1.1a3 - Please be noted that RPM > building process failed. Vlad will > > 4. RDS - new version from main trunk > > 5. Kernel code based on git > > 6. Standard network configuration > > > Known issues: > 1. ipath installation fails on 2.6.9 - 2.6.11* kernels > 2. OSU MPI compilation fails on SLES10, PPC64 > 3. SRP is not supported on 2.6.9 - 2.6.13* kernels - Ishai will follow up > with details > 4. Open MPI RPM build process fails - Jeff, will you be able to send us fixes > by Wed? > > > Regards, >Aviram > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Doug Ledford > Sent: Monday, May 01, 2006 11:53 PM > To: [EMAIL PROTECTED] > Subject: [openfabrics-ewg] Current OFED kernel snapshot > > There's rumored to be a significant number of changes between rc3 and rc4 on > the kernel module front. I would like to get started on integrating those > changes sooner rather than later. So, where would I go to get a snapshot of > the latest OFED kernel > tree. So far I've only found kernel trees under the tags directory and > obviously the rc4 tag hasn't been populated yet. > > -- > Doug Ledford <[EMAIL PROTECTED]> > Red Hat, Inc. > 1801 Varsity Dr. > Raleigh, NC 27606 > > ___ > openfabrics-ewg mailing list > [EMAIL PROTECTED] > http://openib.org/mailman/listinfo/openfabrics-ewg Hi We have a problem when trying to back port SRP to RH4 U2 and U3 (Actually to any kernel earlier than 2.6.13). The problem is when the SCSI driver is calling to eh_abort_handler, or to eh_device_reset_handler. In the current kernel (starting from 2.6.13) this call is made without host_lock spin-lock locked. In the SRP code that performs the abort and the reset (srp_send_tsk_mgmt) we send a message to the target and we wait for a response from the target. In early versions of the kernel the SCSI driver performs irq_spinlock_save to the host_lock before calling to the abort or reset handlers. This creates a problem: The SRP driver can not go to sleep until the target will answer. Any ideas? -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH] SRP: Avoid a potential deadlock
On Mon, May 01, 2006 at 11:27:24AM -0700, Roland Dreier wrote: > Ishai> Avoid a potential dead-lock. In srp_disconnect_target > Ishai> there is a call to ib_send_cm_dreq and a wait for > Ishai> completion If when getting DREP there is no comp no one > Ishai> will end this wait > > I thought that after the DREP is received, the CM will go through > timewait and we will eventually get a TIMEWAIT_EXIT event (with a > completion). Am I wrong? Have you actually seen this deadlock happen > in practice? > > - R. I had a deadlock and I suspected at this. I'm not sure that it was the reason for the deadlock. Vu, What do you think? -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Re: [PATCH 00/12] SRP: Changing ibsrpdm
On Mon, May 01, 2006 at 10:53:36AM -0700, Roland Dreier wrote: > Ishai> Hi, I'm going to send 12 patches. 6 patches for the kernel, > Ishai> and 6 for the userspace ibsrpdm. The kernel patches avoid > Ishai> adding the same target twice, allow the removal of a > Ishai> target, and add a query about the connected targets. > > In the future can you use a different descriptive title for each patch? OK > > Also (although I haven't reviewed the actual code yet) this mostly > makes sense, but I'm not sure we want to disallow connecting to the > same target twice. Userspace may want to implement a policy of one > conncetion per target, but having multiple connections to the same > target for multipathing/failover seems like something the kernel > should allow. What was your reason for forbidding this? > > - R. As I understand it, the path in multipathing is going to be part of the attributes of the connection to the target. So there will be no problem to add the same target twice, if it has a different path leading to it. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 06/12] SRP: Changing ibsrpdm
On Mon, May 01, 2006 at 04:50:32PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:28:48PM +0300, Ishai Rabinovitz wrote: > > > > Support a display of list of target from user level. > > > > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > === > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-21 > > 01:13:04.0 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 > > 03:56:05.0 +0300 > > @@ -1730,6 +1730,63 @@ end: > > > > static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); > > > > +#define TARGET_INFO_BUF_SIZE 126 > > + > > +static ssize_t list_targets(struct class_device *class_dev, char *buf) > > +{ > > + struct srp_host *host = > > + container_of(class_dev, struct srp_host, class_dev); > > + struct srp_target_port *target; > > + int printed=0, ret; > > + > > + mutex_lock(&host->target_mutex); > > + list_for_each_entry(target, &host->target_list, list) > > Can this race with list addition / removal? I saw that you removed the > lock in an earlier patch? No, In an erlier patch I did not removed the lock, I enlarged it scope to include the entire call to srp_find_target. > > > + if (target->state == SRP_TARGET_LIVE) { > > You'd have an easier time with the indentation if you'd do > > if (target->state != SRP_TARGET_LIVE) > continue; > > here > > > + ret = sprintf(buf+printed, > > + "id_ext=%016llx,ioc_guid=%016llx," > > + "dgid=%04x%04x%04x%04x%04x%04x%04x%04x," > > + "pkey=%04x,service_id=%016llx\n", > > + (unsigned long long) > > + be64_to_cpu(target->id_ext), > > + (unsigned long long) > > + be64_to_cpu(target->ioc_guid), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[0]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[2]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[4]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[6]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[8]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[10]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[12]), > > + (int) be16_to_cpu(*(__be16 *) > > + &target->path.dgid.raw[14]), > > This is pretty horrible - could you use show_dgid() here? Id will add a redundant copy of the buffer. > > Cheers, > Muli -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] [PATCH 04/12] SRP: Changing ibsrpdm
On Mon, May 01, 2006 at 04:43:23PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:27:39PM +0300, Ishai Rabinovitz wrote: > > > > Do not add the same target twice. > > > > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > === > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-25 > > 15:17:34.0 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 > > 15:19:37.0 +0300 > > @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char > > printk(KERN_WARNING PFX "bad max sect parameter > > '%s'\n", p); > > goto out; > > } > > - target->scsi_host->max_sectors = token; > > + if (target->scsi_host != NULL) > > + target->scsi_host->max_sectors = token; > > break; > > This chunk does not look related to the rest. Is a NULL > target->scsi_host legal here? if not, the check should be removed as > we'd rather take an oops here than hide the problem behind the NULL > pointer check. > > > +/* srp_find_target - If the target exists return it in target, > > + otherwise target is set to NULL. > > + host->target_mutex should be hold */ > > Please use the usual kernel > /* > * stuff > */ > style for multi line comments. OK, Thanks. > > > +static int srp_find_target(const char *buf, struct srp_host *host, > > + struct srp_target_port **target) > > +{ > > + struct srp_target_port *target_to_find, *curr_target; > > + int ret, i; > > + > > + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); > > + ret = srp_parse_options(buf, target_to_find); > > + if (ret) > > + goto free; > > + > > + list_for_each_entry(curr_target, &host->target_list, list) > > + if (target_to_find->ioc_guid == curr_target->ioc_guid && > > + target_to_find->id_ext == curr_target->id_ext && > > + target_to_find->path.pkey == curr_target->path.pkey && > > + target_to_find->service_id == curr_target->service_id) { > > + for (i = 0; i < 16; ++i) > > + if (target_to_find->path.dgid.raw[i] != > > curr_target->path.dgid.raw[i]) > > + break; > > The conditional and check here probably deserves an inline helper > called same_target() or some such. > > > + if (i == 16) { > > + *target = curr_target; > > + goto free; > > + } > > + } > > + > > + *target = NULL; > > + > > +free: > > + kfree(target_to_find); > > + return 0; > > We always return 0 - either this should return void, or you meant to > return ret here instead of 0? You are right as usual, We should return ret. > > > +} > > + > > static ssize_t srp_create_target(struct class_device *class_dev, > > const char *buf, size_t count) > > { > > struct srp_host *host = > > container_of(class_dev, struct srp_host, class_dev); > > struct Scsi_Host *target_host; > > - struct srp_target_port *target; > > + struct srp_target_port *target, *existing_target = NULL; > > int ret; > > int i; > > > > + /* first check if the target already exists */ > > + > > + mutex_lock(&host->target_mutex); > > + ret = srp_find_target(buf, host, &existing_target); > > + if (ret) > > + goto unlock_mutex; > > + > > + if (existing_target) { > > + /* target already exists */ > > + spin_lock_irq(existing_target->scsi_host->host_lock); > > why _irq and not _irqsave? Are you sure this code can't ever be called > with interrupts off via some other path? This function is being called from userspace (writing to /sys/class/infiniband_srp/.../add_target) so no need for irqsave. Do you think we should always use irqsave just to be on the safe side (Maybe in the future someone else will call us)? > > Cheers, > Muli --- Resending the fixed patch -
Re: [openib-general] [PATCHE 02/12] SRP: changing ibsrpdm
On Mon, May 01, 2006 at 04:33:42PM +0300, Muli Ben-Yehuda wrote: > On Mon, May 01, 2006 at 02:25:46PM +0300, Ishai Rabinovitz wrote: > > > > Move the destruction of the host and the removal from a list to a function. > > > > Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> > > > > Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c > > === > > --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-23 > > 14:08:03.0 +0300 > > +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-24 > > 10:47:00.0 +0300 > > @@ -344,6 +344,16 @@ static void srp_disconnect_target(struct > > wait_for_completion(&target->done); > > } > > > > +static void destruct_scsi_host_and_target(struct srp_target_port *target, > > int disconnect_target) > > +{ > > + scsi_remove_host(target->scsi_host); > > + if (disconnect_target) > > + srp_disconnect_target(target); > > + ib_destroy_cm_id(target->cm_id); > > + srp_free_target_ib(target); > > + scsi_host_put(target->scsi_host); > > +} > > + > > static void srp_remove_work(void *target_ptr) > > { > > struct srp_target_port *target = target_ptr; > > @@ -357,10 +374,7 @@ static void srp_remove_work(void *target > > list_del(&target->list); > > mutex_unlock(&target->srp_host->target_mutex); > > > > - scsi_remove_host(target->scsi_host); > > - ib_destroy_cm_id(target->cm_id); > > - srp_free_target_ib(target); > > - scsi_host_put(target->scsi_host); > > + destruct_scsi_host_and_target(target, 0); > > Is not disconnecting from the target here actually the right thing to > do? considering we're then destroying the target's queue pairs and > freeing it? > > Cheers, > Muli Hi Muli, srp_remove_target is being called only when we were unable to reconnect in srp_reconnect_target so the target is already disconnected. Ishai -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] SRP: Avoid a potential deadlock
Hi, I think there is a potential deadlock when disconnecting from the CM. Roland, can you look at this patch and check if it is needed. Thanks Ishai -- Avoid a potential dead-lock. In srp_disconnect_target there is a call to ib_send_cm_dreq and a wait for completion If when getting DREP there is no comp no one will end this wait Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-17 10:03:08.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:06:19.0 +0300 @@ -1194,6 +1194,7 @@ static int srp_cm_handler(struct ib_cm_i break; case IB_CM_DREP_RECEIVED: + comp = 1; break; case IB_CM_TIMEWAIT_EXIT: -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 12/12] SRP: Changing ibsrpdm
The query can be improved if working against OpenSM that supports the option to ask about a certain bit in the capability mask. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-21 06:26:25.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 06:30:36.0 +0300 @@ -97,10 +97,16 @@ static inline uint64_t htonll(uint64_t x #define SIZE_OF_QUERY_RESPONSE (1 << 18) +#define SM_SUPPORTS_QUERY_OF_PART_OF_CAP_MASK_BIT_MASK (1 << 13) + +#define TEST_ONLY_SET_BIT_BIT_MASK (1 << 31) + #define N_COMP_MASK_NODE_TYPE htonll(1 << 4); #define N_COMP_MASK_LID htonll(1); +#define N_COMP_MASK_CAPABILITY_MASK htonll(1 << 7); + #define INITIAL_SIZE_OF_TARGET_TABLE 10 #define SLEEP_TIME 60 @@ -180,8 +186,6 @@ static void empty_set(targets_set *set) static void destroy_set(targets_set *set) { - int i; - empty_set(set); free(set->array); } @@ -679,6 +683,63 @@ static int get_port_info(int fd, uint32_ return 0; } +int get_class_port_info(int fd, uint32_t agent[2], uint16_t dlid, + int *is_mask_match_supported) +{ + struct ib_user_mad out_mad, in_mad; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_dm_mad *in_dm_mad; + struct srp_dm_class_port_info *class_port_info; + + in_sa_mad = (void *) in_mad.data; + in_dm_mad = (void *) in_mad.data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_DM_ATTR_CLASS_PORT_INFO, 0); + + out_sa_mad->mgmt_class= SRP_MGMT_CLASS_SA; + out_sa_mad->class_version = 2; + + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) + return -1; + +/* TODO: to handle forwarding */ + class_port_info = (void *) in_sa_mad->data; + *is_mask_match_supported = + !!(ntohs(class_port_info->cap_mask) & + SM_SUPPORTS_QUERY_OF_PART_OF_CAP_MASK_BIT_MASK); + + return 0; +} + +int get_node_info(int fd, uint32_t agent[2], uint16_t dlid, uint64_t *n_guid) +{ + struct ib_user_mad out_mad, in_mad; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_dm_mad *in_dm_mad; + struct srp_sa_node_rec *node_info; + + in_sa_mad = (void *) in_mad.data; + in_dm_mad = (void *) in_mad.data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_SA_ATTR_NODE, 0); + + out_sa_mad->mgmt_class= SRP_MGMT_CLASS_SA; + out_sa_mad->class_version = 2; + out_sa_mad->comp_mask = htonll((uint64_t)1); /* LID */ + node_info = (void *) out_sa_mad->data; + node_info->lid= htons(dlid); + + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) + return -1; + + node_info = (void *) in_sa_mad->data; + *n_guid = node_info->port_guid; + + return 0; +} + static int get_port_list(int fd, uint32_t agent[2]) { uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; @@ -686,19 +747,11 @@ static int get_port_list(int fd, uint32_ struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; struct srp_sa_node_rec *node; ssize_t len; - char val[64]; int size; int i; uint64_t subnet_prefix; int isdm; - if (read_file(port_sysfs_path, "sm_lid", val, sizeof val) < 0) { - pr_err("Couldn't read SM LID\n"); - return -1; - } - - sm_lid = strtol(val, NULL, 0); - in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; @@ -762,6 +815,57 @@ static int get_existing_targets() return 0; } +int get_port_list_new(int fd, uint32_t agent[2]) +{ + uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; + struct ib_user_mad out_mad, *in_mad=(void *) in_mad_space; + struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; + struct srp_sa_port_info_rec *port_info; + ssize_t len; + int size; + int i; + uint64_t subnet_prefix; + uint16_t lid; + uint64_t guid; + + in_sa_mad = (void *) in_mad->data; + out_sa_mad = (void *) out_mad.data; + + init_srp_dm_mad(&out_mad, agent[1], sm_lid, SRP_SA_ATTR_PORT_INFO, + TEST_ONLY_SET_BIT_BIT_MASK); + + out_sa_mad->mgmt_class = SRP_MGMT_CLASS_SA; + out_sa_mad->method = SRP_SA_METHOD_GET_TABLE; + out_sa_mad->class_version = 2; + out_sa_mad-
[openib-general] [PATCH 11/12] SRP: Changing ibsrpdm
Add -l option to ibsrpdm. This option activates a daemon that queries for the targets in a loop and tells ib_srp about new target that appears and old target that disappears. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-21 04:47:54.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 05:16:11.0 +0300 @@ -35,6 +35,7 @@ #include #include #include +#include #include "ib_user_mad.h" #include "srp-dm.h" @@ -49,17 +50,39 @@ static uint32_t tid = 1; static int cmd = 0; static int verbose = 0; +static int loop= 0; +static int add_target_fd; +static int remove_target_fd; +static char *list_targets_path; + +#define pr_log(arg...) \ + do {\ + if (verbose) { \ + if (loop) \ + syslog(LOG_WARNING, arg); \ + else if (!cmd) \ + printf(arg);\ + } \ + } while (0) #define pr_human(arg...) \ do {\ - if (!cmd) \ + if (!cmd && !loop) \ printf(arg);\ } while (0) #define pr_cmd(arg...) \ do {\ - if (cmd)\ - printf(arg);\ + if (cmd && !loop) \ + printf(arg);\ + } while (0) + +#define pr_err(arg...) \ + do {\ + if (loop) \ + syslog(LOG_WARNING, arg); \ + else\ + fprintf(stderr, arg); \ } while (0) #if __BYTE_ORDER == __LITTLE_ENDIAN @@ -78,11 +101,106 @@ static inline uint64_t htonll(uint64_t x #define N_COMP_MASK_LID htonll(1); +#define INITIAL_SIZE_OF_TARGET_TABLE 10 + +#define SLEEP_TIME 60 + +static int size_of_target_table = INITIAL_SIZE_OF_TARGET_TABLE; + +/* Implementaion of target set in an array. +* Assumption: there will be small number of targets +* TODO: If this assumption does not hold, +*change the implemantaion to a hash or a tree +*/ + +typedef struct { + char **array; + unsigned int next_index; + unsigned int size; +} targets_set; + +static int create_set(targets_set *set) +{ + set->next_index = 0; + set->size = size_of_target_table; + set->array = calloc(set->size, sizeof(char *)); + if (set->array == NULL) { + perror("calloc:"); + return -1; + } + + return 0; +} + +static int add_to_set(targets_set *set, char *target_info) +{ + if (set->next_index == set->size) { + if (set->size == size_of_target_table) + size_of_target_table *= 2; + set->size = size_of_target_table; + set->array = realloc(set->array, set->size * sizeof(char *)); + if (set->array == NULL) { + pr_err("realloc: %s\n", strerror(errno)); + return -1; + } + } + set->array[set->next_index] = strdup(target_info); + if (set->array[set->next_index] == NULL) { + pr_err("strdup: %s\n", strerror(errno)); + return -1; + } + ++set->next_index; + + return 0; +} + +static int remove_from_set(targets_set *set, char *target_info) +{ + int i; + + for (i = 0; i < set->next_index; ++i) + if (!strcmp(set->array[i], target_info)) { + free(set->array[i]); + set->array[i] = set->array[set->next_index]; + --set->next_index; + return 0; + } + + return -1; +} + +static void empty_set(targets_set *set) +{ + int i; + + for (i = 0; i < set->next_index; ++i) + free(set->array[i]); + set->next_index = 0; +} + +static void destroy_set(targets_set *set) +{ + int i; + + empty_set(set); + free(set->array); +} + +/* for_each_entry_in_set(char *tar
[openib-general] [PATCH 10/12] SRP: Changing ibsrpdm
Add a function send_and_get that handles the communication and retries. Reduce redundancy. Increment TID on retry. Bound the number of retries. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-21 01:35:10.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 01:41:28.0 +0300 @@ -85,6 +85,36 @@ static void usage(const char *argv0) fprintf(stderr, "Usage: %s [-vc] [-d ]\n", argv0); } +#define NUM_OF_RETRIES 3 +int send_and_get(int fd, struct ib_user_mad *out_mad, +struct ib_user_mad *in_mad, long in_mad_size) +{ + int i, len, in_mad_real_size; + struct srp_dm_mad *out_dm_mad; + + in_mad_real_size = (in_mad_size ? in_mad_size : sizeof(struct ib_user_mad)); + for (i = 0; i < NUM_OF_RETRIES; ++i) + { + len = write(fd, out_mad, sizeof(struct ib_user_mad)); + if (len != sizeof(struct ib_user_mad)) { + fprintf(stderr, "write: %s\n", strerror(errno)); + return -1; + } + + len = read(fd, in_mad, in_mad_real_size); + if ((in_mad_size == 0 && len == in_mad_real_size) || + (in_mad_size != 0 && len > 0)) + return len; + else if (in_mad->hdr.status != ETIMEDOUT) { + fprintf(stderr, "%s/%d: read: %s\n", __func__, __LINE__, strerror(errno)); + return -1; + } + out_dm_mad = (void *) out_mad->data; + ((uint32_t *) &out_dm_mad->tid)[1] = tid++; + } + return -1; +} + static int read_file(const char *dir, const char *file, char *buf, size_t size) { char *path; @@ -234,19 +264,8 @@ static int set_class_port_info(int fd, u ((uint16_t *) cpi->trap_gid)[i] = htons(strtol(val + i * 5, NULL, 16)); } -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { @@ -266,19 +285,8 @@ static int get_iou_info(int fd, uint32_t init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_IO_UNIT_INFO, 0); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); - return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } in_dm_mad = (void *) in_mad.data; if (in_dm_mad->status) { @@ -300,19 +308,8 @@ static int get_ioc_prof(int fd, uint32_t init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_IO_CONTROLLER_PROFILE, ioc); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; - fprintf(stderr, "%s/%d: ", __func__, __LINE__); - perror("read"); - return -1; - } if (in_mad.hdr.status != 0) { fprintf(stderr, "IO Controller Profile query timed out\n"); @@ -340,19 +337,8 @@ static int get_svc_entries(int fd, uint3 init_srp_dm_mad(&out_mad, agent[1], dlid, SRP_DM_ATTR_SERVICE_ENTRIES, (ioc << 16) | (end << 8) | start); -again: - if (write(fd, &out_mad, sizeof out_mad) != sizeof out_mad) { - perror("write"); + if (send_and_get(fd, &out_mad, &in_mad, 0) < 0) return -1; - } - - if (read(fd, &in_mad, sizeof in_mad) != sizeof in_mad) { - if (in_mad.hdr.status == ETIMEDOUT) - goto again; -
[openib-general] [PATCH 09/12] SRP: Changing ibsrpdm
alloca man page on my system says: The alloca() function is machine and compiler dependent. On many systems its implementation is buggy. Its use is discouraged. Lets not use it. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-16 13:09:07.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-16 13:11:17.0 +0300 @@ -510,7 +510,8 @@ again: int get_port_list(int fd, uint32_t agent[2]) { - struct ib_user_mad out_mad, *in_mad; + uint8_t in_mad_space[SIZE_OF_QUERY_RESPONSE]; + struct ib_user_mad out_mad, *in_mad=(void *) in_mad_space; struct srp_dm_rmpp_sa_mad *out_sa_mad, *in_sa_mad; struct srp_sa_node_rec *node; ssize_t len; @@ -521,8 +522,6 @@ int get_port_list(int fd, uint32_t agent uint64_t subnet_prefix; int isdm; - in_mad= alloca(SIZE_OF_QUERY_RESPONSE); - in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 08/12] SRP: Changing ibsrpdm
Use constants for bits in masks. Improves readability. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-21 01:18:55.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 03:41:39.0 +0300 @@ -70,6 +70,14 @@ static inline uint64_t ntohll(uint64_t x static inline uint64_t htonll(uint64_t x) { return x; } #endif +#define IS_DM_MASK (1 << 19) + +#define SIZE_OF_QUERY_RESPONSE (1 << 18) + +#define N_COMP_MASK_NODE_TYPE htonll(1 << 4); + +#define N_COMP_MASK_LID htonll(1); + static char *sysfs_path = "/sys"; static void usage(const char *argv0) @@ -474,7 +482,7 @@ static int get_port_info(int fd, uint32_ out_sa_mad->mgmt_class= SRP_MGMT_CLASS_SA; out_sa_mad->class_version = 2; - out_sa_mad->comp_mask = htonll(1); /* LID */ + out_sa_mad->comp_mask = N_COMP_MASK_LID; port_info = (void *) out_sa_mad->data; port_info->endport_lid= htons(dlid); @@ -495,7 +503,7 @@ again: port_info = (void *) in_sa_mad->data; *subnet_prefix = ntohll(port_info->subnet_prefix); - *isdm = !!(ntohl(port_info->capability_mask) & (1 << 19)); + *isdm = !!(ntohl(port_info->capability_mask) & IS_DM_MASK); return 0; } @@ -519,7 +527,7 @@ static int get_port_list(int fd, uint32_ sm_lid = strtol(val, NULL, 0); - in_mad= alloca(1 << 18); + in_mad= alloca(SIZE_OF_QUERY_RESPONSE); in_sa_mad = (void *) in_mad->data; out_sa_mad = (void *) out_mad.data; @@ -529,7 +537,7 @@ static int get_port_list(int fd, uint32_ out_sa_mad->mgmt_class= SRP_MGMT_CLASS_SA; out_sa_mad->method= SRP_SA_METHOD_GET_TABLE; out_sa_mad->class_version = 2; - out_sa_mad->comp_mask = htonll(1ul << 4); /* node type */ + out_sa_mad->comp_mask = N_COMP_MASK_NODE_TYPE; out_sa_mad->rmpp_version = 1; out_sa_mad->rmpp_type = 1; node = (void *) out_sa_mad->data; @@ -541,7 +549,7 @@ again: return -1; } - len = read(fd, in_mad, 1 << 18); + len = read(fd, in_mad, SIZE_OF_QUERY_RESPONSE); if (len < 0) { fprintf(stderr, "%s/%d: ", __func__, __LINE__); perror("read"); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 07/12] SRP: Changing ibsrpdm
Remove trailing spaces and arranging tabs. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/src/userspace/srptools/src/srp-dm.c === --- last_stable.orig/src/userspace/srptools/src/srp-dm.c2006-04-21 03:54:05.0 +0300 +++ last_stable/src/userspace/srptools/src/srp-dm.c 2006-04-21 04:20:43.0 +0300 @@ -102,7 +102,6 @@ static int read_file(const char *dir, co return len; } - static int setup_port_sysfs_path(void) { char *env; char class_dev_path[256]; @@ -135,7 +134,7 @@ static int setup_port_sysfs_path(void) { fprintf(stderr, "Couldn't read ibdev attribute\n"); return -1; } - + if (read_file(class_dev_path, "port", ibport, sizeof ibport) < 0) { fprintf(stderr, "Couldn't read port attribute\n"); return -1; @@ -385,7 +384,7 @@ static int do_port(int fd, uint32_t agen pr_human("change ID: %04x\n", ntohs(iou_info.change_id)); pr_human("max controllers: 0x%02x\n", iou_info.max_controllers); - if (verbose > 0) + if (verbose > 0) for (i = 0; i < iou_info.max_controllers; ++i) { pr_human("controller[%3d]: ", i + 1); switch ((iou_info.controller_list[i / 2] >> -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 06/12] SRP: Changing ibsrpdm
Support a display of list of target from user level. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-21 01:13:04.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-21 03:56:05.0 +0300 @@ -1730,6 +1730,63 @@ end: static CLASS_DEVICE_ATTR(remove_target, S_IWUSR, NULL, srp_remove_target); +#define TARGET_INFO_BUF_SIZE 126 + +static ssize_t list_targets(struct class_device *class_dev, char *buf) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + struct srp_target_port *target; + int printed=0, ret; + + mutex_lock(&host->target_mutex); + list_for_each_entry(target, &host->target_list, list) + if (target->state == SRP_TARGET_LIVE) { + ret = sprintf(buf+printed, + "id_ext=%016llx,ioc_guid=%016llx," + "dgid=%04x%04x%04x%04x%04x%04x%04x%04x," + "pkey=%04x,service_id=%016llx\n", + (unsigned long long) + be64_to_cpu(target->id_ext), + (unsigned long long) + be64_to_cpu(target->ioc_guid), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[0]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[2]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[4]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[6]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[8]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[10]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[12]), + (int) be16_to_cpu(*(__be16 *) + &target->path.dgid.raw[14]), + be16_to_cpu(target->path.pkey), + (unsigned long long) + be64_to_cpu(target->service_id)); + if (ret <= 0) + goto end; + + printed += ret; + + if (printed + TARGET_INFO_BUF_SIZE > PAGE_SIZE - 1) + break; + } + + ret = printed; + +end: + mutex_unlock(&host->target_mutex); + return ret; +} + +static CLASS_DEVICE_ATTR(list_targets, S_IRUGO, list_targets, NULL); + static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { struct srp_host *host = @@ -1789,6 +1846,8 @@ static struct srp_host *srp_add_port(str goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_remove_target)) goto err_class; + if (class_device_create_file(&host->class_dev, &class_device_attr_list_targets)) + goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_ibdev)) goto err_class; if (class_device_create_file(&host->class_dev, &class_device_attr_port)) -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 05/12] SRP: Changing ibsrpdm
Support a remove of a target from user level. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-05-01 12:30:01.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-05-01 12:36:22.0 +0300 @@ -960,10 +960,12 @@ static int srp_queuecommand(struct scsi_ long req_index; int len; - if (target->state == SRP_TARGET_CONNECTING) + if (target->state == SRP_TARGET_CONNECTING || + target->state == SRP_TARGET_RECONNECTING) goto err; if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) { scmnd->result = DID_BAD_TARGET << 16; done(scmnd); @@ -1254,6 +1256,7 @@ static int srp_send_tsk_mgmt(struct scsi spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) { scmnd->result = DID_BAD_TARGET << 16; goto out; @@ -1359,6 +1362,7 @@ static ssize_t show_ioc_guid(struct clas struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1371,6 +1375,7 @@ static ssize_t show_service_id(struct cl struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1383,6 +1388,7 @@ static ssize_t show_pkey(struct class_de struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1394,6 +1400,8 @@ static ssize_t show_dgid(struct class_de struct srp_target_port *target = host_to_target(class_to_shost(cdev)); if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_DISCONNECTED || + target->state == SRP_TARGET_DISCONNECTED || target->state == SRP_TARGET_REMOVED) return -ENODEV; @@ -1447,11 +1455,11 @@ static int srp_add_target(struct srp_hos if (scsi_add_host(target->scsi_host, host->dev->dev->dma_device)) return -ENODEV; - mutex_lock(&host->target_mutex); list_add_tail(&target->list, &host->target_list); - mutex_unlock(&host->target_mutex); + spin_lock_irq(target->scsi_host->host_lock); target->state = SRP_TARGET_LIVE; + spin_unlock_irq(target->scsi_host->host_lock); /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ scsi_scan_target(&target->scsi_host->shost_gendev, @@ -1642,7 +1650,6 @@ static ssize_t srp_create_target(struct { struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); - struct Scsi_Host *target_host; struct srp_target_port *target, *existing_target = NULL; int ret; int i; @@ -1663,6 +1670,7 @@ static ssize_t srp_create_target(struct buf); ret = -EEXIST; break; + case SRP_TARGET_RECONNECTING: case SRP_TARGET_CONNECTING: /* It is in the middle of reconnecting */ ret = -EALREADY; @@ -1671,6 +1679,10 @@ static ssize_t srp_create_target(struct /* It will be removed soon - create a new one */ case SRP_TARGET_REMOVED: /* target is dead, create a new one */ + existing_target = NULL; + break; + case SRP_TARGET_DISCONNECTED: + existing_target->state = SRP_TARGET_RECONNECTING; break; } spin_unlock_irq(existing_target->scsi_host->host_lock); @@ -1678,26 +1690,30 @@ static ssize_t srp_create_target(struct goto unlock_mutex; } - /* really create the target */ - target_host = scsi_host_alloc(&srp_template, - sizeof (struct srp_target_port)); - if (!target_h
[openib-general] [PATCH 04/12] SRP: Changing ibsrpdm
Do not add the same target twice. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-25 15:17:34.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-25 15:19:37.0 +0300 @@ -1478,7 +1478,8 @@ static int srp_parse_options(const char printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); goto out; } - target->scsi_host->max_sectors = token; + if (target->scsi_host != NULL) + target->scsi_host->max_sectors = token; break; default: @@ -1503,20 +1504,89 @@ out: return ret; } +/* srp_find_target - If the target exists return it in target, + otherwise target is set to NULL. + host->target_mutex should be hold */ +static int srp_find_target(const char *buf, struct srp_host *host, + struct srp_target_port **target) +{ + struct srp_target_port *target_to_find, *curr_target; + int ret, i; + + target_to_find = kzalloc(sizeof *target_to_find, GFP_KERNEL); + ret = srp_parse_options(buf, target_to_find); + if (ret) + goto free; + + list_for_each_entry(curr_target, &host->target_list, list) + if (target_to_find->ioc_guid == curr_target->ioc_guid && + target_to_find->id_ext == curr_target->id_ext && + target_to_find->path.pkey == curr_target->path.pkey && + target_to_find->service_id == curr_target->service_id) { + for (i = 0; i < 16; ++i) + if (target_to_find->path.dgid.raw[i] != curr_target->path.dgid.raw[i]) + break; + if (i == 16) { + *target = curr_target; + goto free; + } + } + + *target = NULL; + +free: + kfree(target_to_find); + return 0; +} + static ssize_t srp_create_target(struct class_device *class_dev, const char *buf, size_t count) { struct srp_host *host = container_of(class_dev, struct srp_host, class_dev); struct Scsi_Host *target_host; - struct srp_target_port *target; + struct srp_target_port *target, *existing_target = NULL; int ret; int i; + /* first check if the target already exists */ + + mutex_lock(&host->target_mutex); + ret = srp_find_target(buf, host, &existing_target); + if (ret) + goto unlock_mutex; + + if (existing_target) { + /* target already exists */ + spin_lock_irq(existing_target->scsi_host->host_lock); + switch (existing_target->state) { + case SRP_TARGET_LIVE: + printk(KERN_WARNING PFX "target %s already exists\n", + buf); + ret = -EEXIST; + break; + case SRP_TARGET_CONNECTING: + /* It is in the middle of reconnecting */ + ret = -EALREADY; + break; + case SRP_TARGET_DEAD: + /* It will be removed soon - create a new one */ + case SRP_TARGET_REMOVED: + /* target is dead, create a new one */ + break; + } + spin_unlock_irq(existing_target->scsi_host->host_lock); + if (ret) + goto unlock_mutex; + } + + /* really create the target */ target_host = scsi_host_alloc(&srp_template, sizeof (struct srp_target_port)); - if (!target_host) - return -ENOMEM; + if (!target_host) { + ret = -ENOMEM; + goto unlock_mutex; + } target_host->max_lun = SRP_MAX_LUN; @@ -1533,7 +1603,7 @@ static ssize_t srp_create_target(struct ret = srp_parse_options(buf, target); if (ret) - goto err; + goto err_put_scsi_host; ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); @@ -1554,7 +1624,7 @@ static ssize_t srp_create_target(struct ret = srp_create_target_ib(target); if (ret) - goto err; + goto err_put_scsi_host; target->cm_id =
[openib-general] [PATCH 03/12] SRP: Changing ibsrpdm
It is nicer to perform the init_work just before the call to schedule_work. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-17 10:57:59.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-18 02:26:29.0 +0300 @@ -828,8 +829,10 @@ static void srp_completion(struct ib_cq wc.wr_id & SRP_OP_RECV ? "receive" : "send", wc.status); spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state == SRP_TARGET_LIVE) + if (target->state == SRP_TARGET_LIVE) { + INIT_WORK(&target->work, srp_reconnect_work, target); schedule_work(&target->work); + } spin_unlock_irqrestore(target->scsi_host->host_lock, flags); break; } @@ -1601,8 +1684,6 @@ static ssize_t srp_create_target(struct target->scsi_host = target_host; target->srp_host = host; - INIT_WORK(&target->work, srp_reconnect_work, target); - for (i = 0; i < SRP_SQ_SIZE - 1; ++i) target->req_ring[i].next = i + 1; target->req_ring[SRP_SQ_SIZE - 1].next = -1; -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCHE 02/12] SRP: changing ibsrpdm
Move the destruction of the host and the removal from a list to a function. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-23 14:08:03.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-24 10:47:00.0 +0300 @@ -344,6 +344,16 @@ static void srp_disconnect_target(struct wait_for_completion(&target->done); } +static void destruct_scsi_host_and_target(struct srp_target_port *target, int disconnect_target) +{ + scsi_remove_host(target->scsi_host); + if (disconnect_target) + srp_disconnect_target(target); + ib_destroy_cm_id(target->cm_id); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); +} + static void srp_remove_work(void *target_ptr) { struct srp_target_port *target = target_ptr; @@ -357,10 +374,7 @@ static void srp_remove_work(void *target list_del(&target->list); mutex_unlock(&target->srp_host->target_mutex); - scsi_remove_host(target->scsi_host); - ib_destroy_cm_id(target->cm_id); - srp_free_target_ib(target); - scsi_host_put(target->scsi_host); + destruct_scsi_host_and_target(target, 0); /* And another put to really free the target port... */ scsi_host_put(target->scsi_host); } @@ -1734,13 +1746,8 @@ static void srp_remove_one(struct ib_dev flush_scheduled_work(); list_for_each_entry_safe(target, tmp_target, -&host->target_list, list) { - scsi_remove_host(target->scsi_host); - srp_disconnect_target(target); - ib_destroy_cm_id(target->cm_id); - srp_free_target_ib(target); - scsi_host_put(target->scsi_host); - } +&host->target_list, list) + destruct_scsi_host_and_target(target, 1); ib_dereg_mr(host->mr); ib_dealloc_pd(host->pd); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCHE 01/12] SRP: changing ibsrpdm
Remove a redundant if Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-17 10:06:19.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-17 10:11:24.0 +0300 @@ -1798,8 +1798,7 @@ static void srp_remove_one(struct ib_dev list_for_each_entry_safe(target, tmp_target, &host->target_list, list) { spin_lock_irqsave(target->scsi_host->host_lock, flags); - if (target->state != SRP_TARGET_REMOVED) - target->state = SRP_TARGET_REMOVED; + target->state = SRP_TARGET_REMOVED; spin_unlock_irqrestore(target->scsi_host->host_lock, flags); } mutex_unlock(&host->target_mutex); -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH 00/12] SRP: Changing ibsrpdm
Hi, I'm going to send 12 patches. 6 patches for the kernel, and 6 for the userspace ibsrpdm. The kernel patches avoid adding the same target twice, allow the removal of a target, and add a query about the connected targets. The userspace patches change ibsrpdm to a real daemon - that runs all the time and updates the kernel with the visible targets. Some of the kernel patches should be applied after Vu patches for fmr. The functionality of the changes can work without Vu's patches, but they are changing code in the same functions, so there may be some simple conflicts when applied without Vu's patches. -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] [PATCH] SRP: fix crash in srp_process_rsp
Hi srp_process_rsp crashes on NULL pointer dereference. The following fixes the crash. Is this a correct fix? --- Avoiding dereference of a null pointer. Signed-off-by: Ishai Rabinovitz <[EMAIL PROTECTED]> Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c === --- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c2006-04-26 15:38:23.0 +0300 +++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c 2006-04-26 17:45:22.0 +0300 @@ -655,9 +655,11 @@ static void srp_process_rsp(struct srp_t complete(&req->done); } else { scmnd = req->scmnd; - if (!scmnd) + if (!scmnd) { printk(KERN_ERR "Null scmnd for RSP w/tag %016llx\n", (unsigned long long) rsp->tag); + goto unlock; + } scmnd->result = rsp->status; if (rsp->flags & SRP_RSP_FLAG_SNSVALID) { @@ -683,7 +685,7 @@ static void srp_process_rsp(struct srp_t } else req->cmd_done = 1; } - +unlock: spin_unlock_irqrestore(target->scsi_host->host_lock, flags); } -- Ishai Rabinovitz ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] FW: [openfabrics-ewg] Changes in SM for the new SRP daemon
Yes it should. Ishai From: Woodruff, Robert J [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 1:31 AMTo: Ishai Rabinovitz; [EMAIL PROTECTED]Subject: RE: [openfabrics-ewg] Changes in SM for the new SRP daemon Shouldn't this discussion be on openib-general ? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ishai RabinovitzSent: Wednesday, April 05, 2006 2:14 PMTo: [EMAIL PROTECTED]Subject: [openfabrics-ewg] Changes in SM for the new SRP daemon Summary: In the next release of SRP we want to add a daemon that is executed in each initiator and finds out which targets exist in the fabric. The new daemon can use SA capabilities from IBTA 1.2 errata (details at the bottom) to improve performance. If you are using SM other then openIB's openSM please support this feature in your SM. Details: When one wants to find all SRP targets and their information in the fabric, he/she currently run "ibsrpdm""ibsrpdm" uses the following procedure to discover all SRP targets available in the fabric "ibsrpdm" sends a query get_table node info with node type is CA. It gets quite a big table and then for every node it sends a port_info query. From the response to this query the initiator can check if this port is an SRP target. (dm bit capability is set) The problem with this procedure is that it may create too much traffic on the fabric. Let's assume there is a cluster of 4096 nodes booting together. Each of this 4096 nodes is sending the first query and gets a list of 4096 nodes. This list is divided into a long number of UD messages and may cause retransmit. After getting this list each node sends 4096 queries for the port of each node. This traffic is quite huge. The SA has a new capability to answer the query: "please return a list of the ports that has the dm bit set" (meaning return ports of SRP targets). (This capability of the SA is part of Errata Release Version: 1.2 1/26/2006 Chapter/Subsection: 15.2.5.3 - quoted at the bottom). Using this capability we can use the following procedure: The daemon will send "get table port_info of ports that has dm bit set" query and gets a table of small number of port_info. For each port It queries for the guid of this ports. This will significantly reduce the traffic on the fabric. Actually, in this solution there is so little traffic that the new daemon will run it periodically (every minute) to look for changes (There will be less traffic than registering to Trap 64 and Trap 65.) Quoting the errata: Errata Tracking Number: MGTWG8372 Sub-Case Number: 0 Reference ID: 4291 Title: Enhanced SA PortInfoRecord searches Submitter: Livingston, James ([EMAIL PROTECTED]) Volume: 1 Revision: 1.2 Errata Release Version: 1.2 1/26/2006 Chapter/Subsection: 15 Page: 885 Line: 20 AssignedIntensity: Status/Disposition: WG_Approved Problem Description: Add a new row to Table 186 SA-Specific ClassPortInfo:CapabilityMask Bits: Problem Resolution: Add a new row to Table 186 SA-Specific ClassPortInfo:CapabilityMask Bits Original Text: Corrected Text: IsPortInfoCapMaskMatchSupported 13 If this value is 1, SA shall support matching the PortInfo:CapabilityMask component as described in 15.2.5.3>. Comment History: Dec 14, 2005 08:11:26 PM Old: New:Pending By:Benner, Alan ([EMAIL PROTECTED]) Errata Tracking Number: MGTWG8372 Sub-Case Number: 1 Reference ID: 4292 Title: Enhanced SA PortInfoRecord searches Submitter: Livingston, James ([EMAIL PROTECTED]) Volume: 1 Revision: 1.2 Errata Release Version: 1.2 1/26/2006 Chapter/Subsection: 15.2.5.3 Page: 891 Line: 36 AssignedIntensity: Status/Disposition: WG_Approved Problem Description: Add optional compliance statement for new capability. Problem Resolution: Make the proposed change to the spec text. Original Text: Corrected Text: o15-0.x.y: If SA's ClassPortInfo:CapabilityMask.IsPortInfoCapMaskMatchSupported is 1, then the AttributeModifier of the SubnAdmGet() and SubnAdmGetTable() methods affects the matching behavior on the PortInfo:CapabilityMask component. If the high-order bit (bit 31) of the AttributeModifier is set to 1, matching on the CapabilityMask component will not be an exact bitwise match as described in . Instead, matching will only be performed on those bits which are set to 1 in the PortInfo:CapabilityMask embedded in the query. In , bits in the PortInfo:CapabilityMask embedded in the query that are set to 0 are bitwi