Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. The idea that a simple session wide counter for command sequence number assignment adds such a degree of contention that it
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/8/2015 9:50 AM, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. Hey Bart, IMHO, iSER is an iSCSI extension, so I think the discussion should focus the solving this in iSCSI level in a way that would apply both for TCP and RDMA (and offload devices). And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. Actually I started with that approach, but the independent connections under a single session (I-T-Nexus) violates the command ordering requirement. Plus, such a solution is specific to iSER... RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. So this comment is spot on the pros/cons of the discussion (we might want to leave something for LSF ;)). MCS would not allow a completely lockless data-path due to command ordering. On the other hand implementing some kind of multiple sessions solution feels somewhat like a mis-fit (at least in my view). One of my thoughts about how to overcome the contention on commands sequence numbering was to suggest some kind of
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/8/2015 4:50 PM, James Bottomley wrote: SNIP If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Well, you're right, we can argue about it later, but if it's just round robin, why would it be better done in the initator rather than dm? I agree, My assumption was that a round-robin conn selection would only be a temporal stage until we get full integration with scsi-mq. Not something that will be actually merged. Sagi. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Wed 07-01-15 09:22:13, Lee Duncan wrote: On 01/07/2015 08:25 AM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Cheers, Sagi. I started looking at this last year (and Hannes' suggestion), and would love to join the discussion. Please add me to the list of those that wish to attend. For that please send a separate email with attend request as described in the call for proposals. Thanks! Honza -- Jan Kara j...@suse.cz SUSE Labs, CR -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/8/15, 4:57 PM, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. The idea that a simple session wide counter for command sequence number assignment adds such a degree of contention that it renders MC/S at a performance disadvantage vs. multi-session configurations with all of the extra multipath logic overhead on top is at best, a naive proposition. On the initiator side for MC/S, literally the only thing that needs to be serialized is the assignment of the command sequence number to individual non-immediate PDUs. The sending of the outgoing PDUs + immediate data by the initiator can happen out-of-order, and it's up to the target to ensure that the submission of the commands to the
Re: iscsi_tcp bound to network interface issues after iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND
On 01/08/2015 11:11 AM, Chris Leech wrote: On Thu, Jan 08, 2015 at 10:36:59AM -0600, Michael Christie wrote: On Jan 6, 2015, at 6:40 PM, Chris Leech cle...@redhat.com wrote: Hi all, It looks to me that the changes in iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND have broken interface binding for iscsi_tcp (and iser, assuming interface bindin iser does not do binding. Thanks for clearing that up, less to worry about then. Is this a regression with the patch: commit c0e509e7535372cd5d655bc5a20d3d2bae45df84 Author: Mike Christie micha...@cs.wisc.edu Date: Wed May 7 14:38:13 2014 -0500 iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND Yes, it looks that way to me. Didn't you have a patch to add some flag on the iscsi_transport struct in userspace that indicated if it was a offload driver or something like bind_ep was required? I think you sent it to me, when I was at my old job. Instead of trying to be cute and detect it, then we could just do your patch and modify this test: } else if (*rc == ISCSI_ERR_HOST_NOT_FOUND t-bind_ep_required) { goto free_session; Or, I guess maybe it is better to check before, so do: if (t-bind_ep_required) { host_no = iscsi_sysfs_get_host_no_from_hwinfo(iface, rc); .. } We can then update transport.c and remove the session-conn[0].bind_ep = 1; lines. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Thu, 2015-01-08 at 21:03 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 15:22 -0800, James Bottomley wrote: On Thu, 2015-01-08 at 14:57 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: SNIP The point is that a simple session wide counter for command sequence number assignment is significantly less overhead than all of the overhead associated with running a full multipath stack atop multiple sessions. I don't see how that's relevant to issue speed, which was the measure we were using: The layers above are just a hopper. As long as they're loaded, the MQ lower layer can issue at full speed. So as long as the multipath hopper is efficient enough to keep the queues loaded there's no speed degradation. The problem with a sequence point inside the MQ issue layer is that it can cause a stall that reduces the issue speed. so the counter sequence point causes a degraded issue speed over the multipath hopper approach above even if the multipath approach has a higher CPU overhead. Now, if the system is close to 100% cpu already, *then* the multipath overhead will try to take CPU power we don't have and cause a stall, but it's only in the flat out CPU case. Not to mention that our iSCSI/iSER initiator is already taking a session wide lock when sending outgoing PDUs, so adding a session wide counter isn't adding any additional synchronization overhead vs. what's already in place. I'll leave it up to the iSER people to decide whether they're redoing this as part of the MQ work. Session wide command sequence number synchronization isn't something to be removed as part of the MQ work. It's a iSCSI/iSER protocol requirement. The sequence number is a requirement of the session. Multiple separate sessions means no SN correlation between the different connections, so no global requirement for a SN counter across the queues ... that's what Mike was saying about implementing multipath not using MCS. With MCS we have a single session for all the queues and thus have to correlate the sequence number across all the connections and hence all the queues; without it we don't. That's why the sequence number becomes a potential stall point in MQ implementation of MCS which can be obviated if we use a separate session per queue. James -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. The idea that a simple session wide counter for command sequence number assignment adds such a degree of contention that it renders MC/S at a performance disadvantage vs. multi-session configurations with all of the extra multipath logic overhead on top is at best, a naive proposition. On the initiator side for MC/S, literally the only thing that needs to be serialized is the assignment of the command sequence number to individual non-immediate PDUs. The sending of the outgoing PDUs + immediate
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Thu, 2015-01-08 at 14:29 -0800, James Bottomley wrote: On Thu, 2015-01-08 at 14:16 -0800, Nicholas A. Bellinger wrote: On Thu, 2015-01-08 at 08:50 +0100, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. The idea that a simple session wide counter for command sequence number assignment adds such a degree of contention that it renders MC/S at a performance disadvantage vs. multi-session configurations with all of the extra multipath logic overhead on top is at best, a naive proposition. On the initiator side for MC/S, literally the
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/8/15, 1:50 AM, Bart Van Assche wrote: On 01/07/15 22:39, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Hello Sagi and Mike, I agree with Sagi that adding scsi-mq support in the iSER initiator would help iSER users because that would allow these users to configure a single iSER target and use the multiqueue feature instead of having to configure multiple iSER targets to spread the workload over multiple cpus at the target side. And I agree with Mike that implementing scsi-mq support in the iSER initiator as multiple independent connections probably is a better choice than MC/S. RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. Above I was actually saying we should do a limited MCS. Originally, I tried something like you are suggesting for the non MCS case, but I hit some snags. While I was rethinking it today, I think I figured out where I messed up though. It was just in how I was doing the device/kobject/sysfs compat stuff. Sagi, instead of reviewing that patch you sent me offlist the other day, let me try to update my non MCS patch (I originally did it before you guys did the locking changes so I need to fix it up) and send it tomorrow. We then do not have to worry about MCS support and also issues like the session wide sequence number tracking. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 01/08/15 14:45, Sagi Grimberg wrote: Actually I started with that approach, but the independent connections under a single session (I-T-Nexus) violates the command ordering requirement. Plus, such a solution is specific to iSER... Hello Sagi, Which command ordering requirement are you referring to ? The Linux storage stack does not guarantee that block layer or SCSI commands will be processed in the same order as these commands have been submitted. However, it might be interesting to have a look at virtscsi_pick_vq(). I think the purpose of that function is to keep queueing to the same hwq as long as any commands are being executed. This approach avoids that if an application is migrated by the scheduler from one CPU to another that commands get reordered due to have been submitted to different hwq's. I don't think we already have something similar in blk-mq but this is something that could be discussed further. (we might want to leave something for LSF ;)). Agreed :-) Bart. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Wed, 2015-01-07 at 15:39 -0600, Mike Christie wrote: On 01/07/2015 10:57 AM, Hannes Reinecke wrote: On 01/07/2015 05:25 PM, Sagi Grimberg wrote: Hi everyone, Now that scsi-mq is fully included, we need an iSCSI initiator that would use it to achieve scalable performance. The need is even greater for iSCSI offload devices and transports that support multiple HW queues. As iSER maintainer I'd like to discuss the way we would choose to implement that in iSCSI. My measurements show that iSER initiator can scale up to ~2.1M IOPs with multiple sessions but only ~630K IOPs with a single session where the most significant bottleneck the (single) core processing completions. In the existing single connection per session model, given that command ordering must be preserved session-wide, we end up in a serial command execution over a single connection which is basically a single queue model. The best fit seems to be plugging iSCSI MCS as a multi-queued scsi LLDD. In this model, a hardware context will have a 1x1 mapping with an iSCSI connection (TCP socket or a HW queue). iSCSI MCS and it's role in the presence of dm-multipath layer was discussed several times in the past decade(s). The basic need for MCS is implementing a multi-queue data path, so perhaps we may want to avoid doing any type link aggregation or load balancing to not overlap dm-multipath. For example we can implement ERL=0 (which is basically the scsi-mq ERL) and/or restrict a session to a single portal. As I see it, the todo's are: 1. Getting MCS to work (kernel + user-space) with ERL=0 and a round-robin connection selection (per scsi command execution). 2. Plug into scsi-mq - exposing num_connections as nr_hw_queues and using blk-mq based queue (conn) selection. 3. Rework iSCSI core locking scheme to avoid session-wide locking as much as possible. 4. Use blk-mq pre-allocation and tagging facilities. I've recently started looking into this. I would like the community to agree (or debate) on this scheme and also talk about implementation with anyone who is also interested in this. Yes, that's a really good topic. I've pondered implementing MC/S for iscsi/TCP but then I've figured my network implementation knowledge doesn't spread that far. So yeah, a discussion here would be good. Mike? Any comments? I have been working under the assumption that people would be ok with MCS upstream if we are only using it to handle the issue where we want to do something like have a tcp/iscsi connection per CPU then map the connection to a blk_mq_hw_ctx. In this more limited MCS implementation there would be no iscsi layer code to do something like load balance across ports or transport paths like how dm-multipath does, so there would be no feature/code duplication. For balancing across hctxs, then the iscsi layer would also leave that up to whatever we end up with in upper layers, so again no feature/code duplication with upper layers. So pretty non controversial I hope :) If you can make that work, so we expose MCS in a way that allows upper layers to use it, I'd say it was pretty much perfect. The main objection I've had over the years to multiple connections per session is that it required a duplication of the multi-path code within the iscsi initiator (and that was after several long fights to get multi path out of other fabric initiators), so something that doesn't require the duplication overcomes that objection. If people want to add something like round robin connection selection in the iscsi layer, then I think we want to leave that for after the initial merge, so people can argue about that separately. Well, you're right, we can argue about it later, but if it's just round robin, why would it be better done in the initator rather than dm? James -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: SHA-1 hashing Algorithm for CHAP
On Thursday, January 8, 2015 7:33:44 PM UTC+5:30, Tejas vaykole wrote: Hello Mike, On Monday, January 5, 2015 11:11:20 PM UTC+5:30, Mike Christie wrote: Could you point me to the SCST code you are referring to? What files/functions/lines? Here is the link that tells you about the SCST target- http://scst.sourceforge.net/ You can pull the source code with this command on linux - svn checkout svn://svn.code.sf.net/p/scst/svn/trunk scst-trunk And this is the file that implement CHAP authentication - http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/iscsi-scst/usr/chap.c http://sourceforge.net/p/scst/svn/HEAD/tree/trunk/iscsi-scst/usr/chap.c Missed out the line number and functions. line number 320 : static inline void chap_calc_digest_sha1(char chap_id, const char *secret, int secret_len, implements the SHA1 line number 368 : } else if (!strcmp(p, 7)) { checks for the assigned number 7. Why do you need this? I am just curious and would like to try out thing here. On 01/03/2015 03:15 AM, Tejas vaykole wrote: Hello, I am looking at the SCST target code, where it looks like it supports the SHA-1 Algorithim for message digest generation.The number assigned to SHA-1 is '7' Thanks. Tejas On Monday, September 15, 2014 11:30:52 AM UTC+5:30, Uli wrote: Tejas vaykole tejas.v...@gmail.com javascript: schrieb am 11.09.2014 um 12:22 in Nachricht e87c916b-0b75-4570-b690-71197a5c2...@googlegroups.com javascript:: Hello, I am trying out with the open-iscsi initiator.I see that the initiator uses MD5 algorithm for CHAP. I need help in configuring the initiator to use SHA-1 hashing Algorithm for CHAP. Whcih algorithm number has been assigned to SHA-1 for CHAP? Ic ould not find it. Thanks. Tejas -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com javascript:. To post to this group, send email to open-...@googlegroups.com javascript:. Visit this group at http://groups.google.com/group/open-iscsi http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com mailto:open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-...@googlegroups.com mailto:open-...@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: iscsi_tcp bound to network interface issues after iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND
On Jan 6, 2015, at 6:40 PM, Chris Leech cle...@redhat.com wrote: Hi all, It looks to me that the changes in iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND have broken interface binding for iscsi_tcp (and iser, assuming interface bindin iser does not do binding. Is this a regression with the patch: commit c0e509e7535372cd5d655bc5a20d3d2bae45df84 Author: Mike Christie micha...@cs.wisc.edu Date: Wed May 7 14:38:13 2014 -0500 iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: iscsi_tcp bound to network interface issues after iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND
On Thu, Jan 08, 2015 at 10:36:59AM -0600, Michael Christie wrote: On Jan 6, 2015, at 6:40 PM, Chris Leech cle...@redhat.com wrote: Hi all, It looks to me that the changes in iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND have broken interface binding for iscsi_tcp (and iser, assuming interface bindin iser does not do binding. Thanks for clearing that up, less to worry about then. Is this a regression with the patch: commit c0e509e7535372cd5d655bc5a20d3d2bae45df84 Author: Mike Christie micha...@cs.wisc.edu Date: Wed May 7 14:38:13 2014 -0500 iscsid: retry login for ISCSI_ERR_HOST_NOT_FOUND Yes, it looks that way to me. - Chris -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: SHA-1 hashing Algorithm for CHAP
On Jan 8, 2015, at 9:30 AM, Tejas vaykole tejas.vaykol...@gmail.com wrote: ... Various crypto protocols indeed uses SHA-1 (typically in more complex form like HMAC) for message authentication. And each of them will obviously have some identifier for that. But that has nothing to do with CHAP. For CHAP in iSCSI, you have to look in the iSCSI RFC, and you will find in there only a single identifier, which is for CHAP using MD5. Yes ,you are right. But their is some correction. In iSCSI RFC(3720) page 186 (CHAP 11.1.4) Points to RFC1994(CHAP) for the implementation of CHAP and RFC3720 also mandates initiator/targets to implement MD5 as one required option. But it does not bar the possibility of implementing another hash algorithm with CHAP. Correct. But implementing it at one end of the protocol has no effect; you need to implement it in both initiator and target. You can pick a random number to indicate “CHAP with SHA-1” (such as the 7 you mentioned) and put that in both initiator and target, if you have the ability to modify both. That will work; at that point you have a proprietary extension to iSCSI. But if you want standard initiators or targets to use SHA-1 in a CHAP exchange, you have to start by getting it added to the standard, and then wait for implementers to implement that new feature. The other point I would add is “why bother?” There is no cryptographic reason for doing this, given the present state of knowledge around MD5 and other hashes. It might be worth while proposing such an extension to the standard as a precaution in case a pre-image attack on MD5 is discovered, but at this point such an attack is entirely hypothetical. If your answer is “as an experiment, to see if it can be done”, sure. You can do that, and I would predict that you would get it to work pretty easily (again, given that you have control over the implementations of both initiator and target to make matching changes). But if you want to take it beyond an experiment, the first step would be to do the standards work, and the first step in that work is to justify the effort of making the change. I expect that you may have some difficulty convincing others it’s worth the trouble. paul -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On Jan 8, 2015, at 9:11 AM, Bart Van Assche bvanass...@acm.org wrote: On 01/08/15 14:45, Sagi Grimberg wrote: Actually I started with that approach, but the independent connections under a single session (I-T-Nexus) violates the command ordering requirement. Plus, such a solution is specific to iSER… The iSCSI standard specifies an ordering requirement for the case of multiple connections under a single session. That requirement is in fact a reason why some iSCSI targets have declined to implement multiple connections. On the other hand, there are lots of “MPIO” implementations in many different operating systems that use multiple sessions, so there is no ordering at the iSCSI level, and whatever ordering is required (if any) is instead implemented at higher layers in the requesting OS. Hello Sagi, Which command ordering requirement are you referring to ? The Linux storage stack does not guarantee that block layer or SCSI commands will be processed in the same order as these commands have been submitted. Neither does SCSI, in fact. The ordering rules of the SCSI standard are worth studying. They are a lot weaker than most people expect. A particularly interesting case is multiple concurrent writes with overlapping block ranges. paul -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.