RE: [ofa-general] Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
"Caitlin Bestler" <[EMAIL PROTECTED]> wrote on 13.12.2007 22:08:34: > To clarify, an FMR Work Request is simply posted to the SendQ like > any other Work Request (of course the QP has to be privileged, or > it will complete in error). An SQ Post should never block. This would require hardware support, wouldn't it? eHCA2 doesn't have this kind of support, so FMR WRs are not an option here. J. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
RE: [ofa-general] Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> -Original Message- > From: Joachim Fenkes [mailto:[EMAIL PROTECTED] > Sent: Thursday, December 13, 2007 1:00 PM > To: Caitlin Bestler > Cc: Arnd Bergmann; [EMAIL PROTECTED]; OF-General; LKML; > linuxppc-dev@ozlabs.org; Or Gerlitz; Roland Dreier; Stefan Roscher > Subject: Re: [ofa-general] Re: [ewg] Re: [PATCH] IB/ehca: Serialize > HCA-related hCalls on POWER5 > > [EMAIL PROTECTED] wrote on 13.12.2007 20:22:49: > > > On Dec 13, 2007 12:30 AM, Or Gerlitz <[EMAIL PROTECTED]> wrote: > > > The current implementation of the open iscsi initiator makes sure > to > > > issue commands in thread (sleepable) context, see iscsi_xmitworker > and > > > references to it in drivers/scsi/libiscsi.c , so this keeps ehca > users > > > safe for the time being. > > > I agree, *some* form of FMR support is important for iSER (and > probably > > for NFS over RDMA as well). Rather than adding a crippled NO FMR > > mode it would make more sense to add support for FMR Work Requests. > > I'm not certain what, if any, impact that would have on the Power5 > problem, > > but that's certainly a cleaner path for iWARP. > > Well, FMR WRs wouldn't change the eHCA issue -- the driver would have > to > make an hCall in any case, and the architecture says that the hCalls > used > in this scenario might return H_LONG_BUSY, causing the driver to sleep. > No > way around that. Because of this, eHCA's FMRs are actually standard MRs > with a different API. > > If, as Or said, the iSCSI initiator issues commands in sleepable > context > anyway, nothing would be lost by using standard MRs as a fallback > solution > if FMRs aren't available, would it? > To clarify, an FMR Work Request is simply posted to the SendQ like any other Work Request (of course the QP has to be privileged, or it will complete in error). An SQ Post should never block. But yes, if the current iSCSI initiator always does all call-based FMRs in a sleepable context then I would agree then any changes can wait for the first vendor that wants to support FMR Work Requests. FMR Work Requests can be pipelined, so anyone with hardware that supported them would have strong motivation to enable the open iSCSI initiator to take advantage of this. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ofa-general] Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
[EMAIL PROTECTED] wrote on 13.12.2007 20:22:49: > On Dec 13, 2007 12:30 AM, Or Gerlitz <[EMAIL PROTECTED]> wrote: > > The current implementation of the open iscsi initiator makes sure to > > issue commands in thread (sleepable) context, see iscsi_xmitworker and > > references to it in drivers/scsi/libiscsi.c , so this keeps ehca users > > safe for the time being. > I agree, *some* form of FMR support is important for iSER (and probably > for NFS over RDMA as well). Rather than adding a crippled NO FMR > mode it would make more sense to add support for FMR Work Requests. > I'm not certain what, if any, impact that would have on the Power5 problem, > but that's certainly a cleaner path for iWARP. Well, FMR WRs wouldn't change the eHCA issue -- the driver would have to make an hCall in any case, and the architecture says that the hCalls used in this scenario might return H_LONG_BUSY, causing the driver to sleep. No way around that. Because of this, eHCA's FMRs are actually standard MRs with a different API. If, as Or said, the iSCSI initiator issues commands in sleepable context anyway, nothing would be lost by using standard MRs as a fallback solution if FMRs aren't available, would it? J. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ofa-general] Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Dec 13, 2007 12:30 AM, Or Gerlitz <[EMAIL PROTECTED]> wrote: > Roland Dreier wrote: > > I think the right fix for iSER would be to make iSER work even for > > devices that don't support FMRs. For example cxgb3 doesn't implement > > FMRs so if anyone ever updates iSER to work on iWARP and not just IB, > > then this is something that has to be tackled anyway. Then ehca could > > just get rid of the FMR support it has. > > OK, The iSER design took into account the case of many initiators > running on strong/modern machines talking to possibly lightweight > embedded target for which the processing cost per I/O at the target side > should be minimized, that is at most --one-- RDMA operation should be > issued by the target to serve an I/O request. > > For that end, iSER works with one descriptor (called stag in iWARP and > rkey in IB) per I/O direction sent from the initiator to the target and > hence can't work without some sort of FMR implementation. > > The current implementation of the open iscsi initiator makes sure to > issue commands in thread (sleepable) context, see iscsi_xmitworker and > references to it in drivers/scsi/libiscsi.c , so this keeps ehca users > safe for the time being. > > Or. > I agree, *some* form of FMR support is important for iSER (and probably for NFS over RDMA as well). Rather than adding a crippled NO FMR mode it would make more sense to add support for FMR Work Requests. I'm not certain what, if any, impact that would have on the Power5 problem, but that's certainly a cleaner path for iWARP. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Roland Dreier wrote: > I think the right fix for iSER would be to make iSER work even for > devices that don't support FMRs. For example cxgb3 doesn't implement > FMRs so if anyone ever updates iSER to work on iWARP and not just IB, > then this is something that has to be tackled anyway. Then ehca could > just get rid of the FMR support it has. OK, The iSER design took into account the case of many initiators running on strong/modern machines talking to possibly lightweight embedded target for which the processing cost per I/O at the target side should be minimized, that is at most --one-- RDMA operation should be issued by the target to serve an I/O request. For that end, iSER works with one descriptor (called stag in iWARP and rkey in IB) per I/O direction sent from the initiator to the target and hence can't work without some sort of FMR implementation. The current implementation of the open iscsi initiator makes sure to issue commands in thread (sleepable) context, see iscsi_xmitworker and references to it in drivers/scsi/libiscsi.c , so this keeps ehca users safe for the time being. Or. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> What is the fix you suggest, to add a device query that tells you for > which verbs the documentation does not apply? or enhance the code of the > map_phys_fmr verb within the ehca driver to return error if called > from non-sleepable context? I think the right fix for iSER would be to make iSER work even for devices that don't support FMRs. For example cxgb3 doesn't implement FMRs so if anyone ever updates iSER to work on iWARP and not just IB, then this is something that has to be tackled anyway. Then ehca could just get rid of the FMR support it has. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Joachim Fenkes wrote: > Roland Dreier <[EMAIL PROTECTED]> wrote on 10.12.2007 22:47:37: >> It's an optional device feature, so this should be OK >> (although the iSER driver currently seems to depend on a device >> supporting FMRs, which is probably going to be a problem with iWARP >> support in the future anyway). > I don't feel very well with removing code from the driver that iSER seems > to depend on. Are there plans to fix this in iSER? What is the fix you suggest, to add a device query that tells you for which verbs the documentation does not apply? or enhance the code of the map_phys_fmr verb within the ehca driver to return error if called from non-sleepable context? Or. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [ewg] Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Or Gerlitz <[EMAIL PROTECTED]> wrote on 12.12.2007 13:14:25: > Joachim Fenkes wrote: > > Roland Dreier <[EMAIL PROTECTED]> wrote on 10.12.2007 22:47:37: > > >> It's an optional device feature, so this should be OK > >> (although the iSER driver currently seems to depend on a device > >> supporting FMRs, which is probably going to be a problem with iWARP > >> support in the future anyway). > > > I don't feel very well with removing code from the driver that iSER seems > > to depend on. Are there plans to fix this in iSER? > > What is the fix you suggest, to add a device query that tells you for > which verbs the documentation does not apply? or enhance the code of the > map_phys_fmr verb within the ehca driver to return error if called > from non-sleepable context? Roland, what is your suggestion here? We could implement both versions Or is proposing, but having both at the same time sound like overkill. Christoph R. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Roland Dreier <[EMAIL PROTECTED]> wrote on 10.12.2007 22:47:37: > It's a big problem. If you cannot implement FMRs in such a way that > you can handling having map_phys_fmr being called in a context that > can't sleep, then I think the only option is to remove your FMR > support. That's kind of what I feared you would say =) > It's an optional device feature, so this should be OK > (although the iSER driver currently seems to depend on a device > supporting FMRs, which is probably going to be a problem with iWARP > support in the future anyway). I don't feel very well with removing code from the driver that iSER seems to depend on. Are there plans to fix this in iSER? In reality, PHYP rarely ever returns H_LONG_BUSY, and we haven't had any problems with iSER in the field yet. I admit that our FMR code is dangerous, but I prefer "dangerous but working for the customer" over "not working for the customer at all". Maybe we can agree on keeping the status quo until no more ULPs depend on FMR, then remove FMR from ehca? If so, we'd also let the _irqsave spinlocks around hCalls stay in place. Regards, Joachim ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> > map_phys_fmr > > In fact, we do use hCalls there. Our hardware doesn't actually support FMRs, > so we translate a "map FMR" into a "reallocate PMR", which doesn't work > without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE) > might well return H_LONG_BUSY, so the whole operation might sleep; no way > around it. It's a big problem. If you cannot implement FMRs in such a way that you can handling having map_phys_fmr being called in a context that can't sleep, then I think the only option is to remove your FMR support. It's an optional device feature, so this should be OK (although the iSER driver currently seems to depend on a device supporting FMRs, which is probably going to be a problem with iWARP support in the future anyway). The fact that consumers can map FMRs from interrupt context, while holding locks, etc, is pretty fundamental to the use of FMRs so I don't see any way around the requirement that map_phys_fmr never sleep. - R. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Hi, guys, > We're taking this to the firmware architects at the moment, but they're not > very fond of the idea of reporting the absence of bugs through capability > flags, as this could quickly lead to the exhaustion of flag bits. We'll let > the discussion stew for a bit, but if we don't get this flag, we'll have to > resort to the CPU features. The architects have spoken, and we're getting a capability flag for this. I'll repost my patch with new autodetection code that doesn't involve checking the processor version. > > > Regarding the performance problem, have you checked whether converting all > > > your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance > > > on the older machines? Maybe it's already fast enough that way. > > > > It does seem that the only places that the hcall_lock is taken also > > use msleep, so they must always be in process context. So you can > > safely just use spin_lock(), right? > > As Arnd said, there are hCalls that will never return H_LONG_BUSY_*, such as > H_QUERY_PORT and chums, so they will never sleep. The surrounding functions, > though, are not prepared to be called from interrupt context (GFP_KERNEL comes > to mind), so I agree that a simple spin_lock() will suffice. Thanks, Arnd, for > pointing this out. As I pointed out in my earlier mail, there's still an issue with map_phys_fmr possibly sleeping. Let's keep the irqsave for the time being and revisit this part once we find a solution to map_phys_fmr. Regards, Joachim ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Monday 10 December 2007 00:22, Roland Dreier wrote: > Fair enough... according to Documentation/infiniband/core_locking.txt, > the only driver methods that cannot sleep are: > > [...] > map_phys_fmr In fact, we do use hCalls there. Our hardware doesn't actually support FMRs, so we translate a "map FMR" into a "reallocate PMR", which doesn't work without hCalls. What's more, the hCalls involved (e.g. H_FREE_RESOURCE) might well return H_LONG_BUSY, so the whole operation might sleep; no way around it. How should we deal with this? Thanks, Joachim ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> I think it needs some more inspection. The msleep in there is only called > for hcalls that return H_IS_LONG_BUSY(). In theory, you can call > ehca_plpar_hcall_norets() from inside an interrupt handler if the > hcall in question never returns long busy. Fair enough... according to Documentation/infiniband/core_locking.txt, the only driver methods that cannot sleep are: create_ah modify_ah query_ah destroy_ah bind_mw post_send post_recv poll_cq req_notify_cq map_phys_fmr and I don't think ehca does an hcall from any of those. Of course there might be other driver-internal code paths that I don't know about. Maybe do a quick audit and then stick might_sleep() in the hcall functions to catch any mistakes? - R. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
Roland Dreier <[EMAIL PROTECTED]> wrote on 06.12.2007 19:27:09: > > > + ehca_lock_hcalls = !(cur_cpu_spec->cpu_user_features > > > +& PPC_FEATURE_ARCH_2_05); > > > We already talked about this yesterday, but I still feel that checking the > > instruction set of the CPU should not be used to determine whether a > > specific device driver implementation is used int hypervisor. > > I had the same reaction... is testing cpu_user_features really the > best way to detect this issue? I concur it's not nice, but it was the only feasible method we could find without adding a "bug fixed" feature flag to the partition<->firmware interface. The firmware version reported in the OFDT is not a reliable enough source, and even if it were, it would require a lot of string parsing and matching against tables. We're taking this to the firmware architects at the moment, but they're not very fond of the idea of reporting the absence of bugs through capability flags, as this could quickly lead to the exhaustion of flag bits. We'll let the discussion stew for a bit, but if we don't get this flag, we'll have to resort to the CPU features. > I'll hold off applying this for a few days so you guys can decide the > best thing to do. We'll definitely get some fix into 2.6.24 but we > have time to make a good decision. Right. > > Regarding the performance problem, have you checked whether converting all > > your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance > > on the older machines? Maybe it's already fast enough that way. > > It does seem that the only places that the hcall_lock is taken also > use msleep, so they must always be in process context. So you can > safely just use spin_lock(), right? As Arnd said, there are hCalls that will never return H_LONG_BUSY_*, such as H_QUERY_PORT and chums, so they will never sleep. The surrounding functions, though, are not prepared to be called from interrupt context (GFP_KERNEL comes to mind), so I agree that a simple spin_lock() will suffice. Thanks, Arnd, for pointing this out. We'll keep you guys posted on the feature flag discussion. Until then, have a nice weekend! Joachim ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Thursday 06 December 2007, Roland Dreier wrote: > > Regarding the performance problem, have you checked whether converting all > > your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance > > on the older machines? Maybe it's already fast enough that way. > > It does seem that the only places that the hcall_lock is taken also > use msleep, so they must always be in process context. So you can > safely just use spin_lock(), right? I think it needs some more inspection. The msleep in there is only called for hcalls that return H_IS_LONG_BUSY(). In theory, you can call ehca_plpar_hcall_norets() from inside an interrupt handler if the hcall in question never returns long busy. Arnd <>< ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
> > + ehca_lock_hcalls = !(cur_cpu_spec->cpu_user_features > > + & PPC_FEATURE_ARCH_2_05); > We already talked about this yesterday, but I still feel that checking the > instruction set of the CPU should not be used to determine whether a > specific device driver implementation is used int hypervisor. I had the same reaction... is testing cpu_user_features really the best way to detect this issue? I'll hold off applying this for a few days so you guys can decide the best thing to do. We'll definitely get some fix into 2.6.24 but we have time to make a good decision. > Regarding the performance problem, have you checked whether converting all > your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance > on the older machines? Maybe it's already fast enough that way. It does seem that the only places that the hcall_lock is taken also use msleep, so they must always be in process context. So you can safely just use spin_lock(), right? - R. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] IB/ehca: Serialize HCA-related hCalls on POWER5
On Thursday 06 December 2007, Joachim Fenkes wrote: > printk(KERN_INFO "eHCA Infiniband Device Driver " > "(Version " HCAD_VERSION ")\n"); > > + /* Autodetect hCall locking -- we can't read the firmware version > + * directly, but we know that starting with POWER6, all firmware > + * versions are good. > + */ > + if (ehca_lock_hcalls == -1) > + ehca_lock_hcalls = !(cur_cpu_spec->cpu_user_features > + & PPC_FEATURE_ARCH_2_05); > + > ret = ehca_create_comp_pool(); > if (ret) { > ehca_gen_err("Cannot create comp pool."); We already talked about this yesterday, but I still feel that checking the instruction set of the CPU should not be used to determine whether a specific device driver implementation is used int hypervisor. At the very least, I think you should change this to read the hypervisor version number from the device tree, though the ideal solution would be to have the absence of this bug encoded in the device node for the ehca device itself. Regarding the performance problem, have you checked whether converting all your spin_lock_irqsave to spin_lock/spin_lock_irq improves your performance on the older machines? Maybe it's already fast enough that way. Arnd <>< ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev