Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
Ernst Schwab esch...@online.de wrote: int spi_async(struct spi_device *spi, struct spi_message *message) { [...] + if (master-bus_lock_flag) + ret = -EBUSY; + else + ret = __spi_async(spi, message); + return ret; } I'm still not satisfied with this. The code present in the kernel (without patches) allows spi_async() to be called concurrently from interrupt context and process context, without any loss of data. With the first patch attempt (bus lock implemented in the master driver) no messages were lost during the bus lock. With the current patch, drivers calling spi_async() from the interrupt context (are there any??) will get an error code, which they cannot handle without adding additional queuing code in the driver. A possible solution would be to add an extra queue, handled in the spi.c module, stored in the master's data structure, that stores all messages coming in while the bus lock is active. The queue can be flushed by calling __spi_async() for each message as soon as the bus lock is removed. Regards Ernst -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
On Thu, Feb 25, 2010 at 3:36 AM, Ernst Schwab esch...@online.de wrote: On Wed, 24 Feb 2010 13:32:43 -0700 Grant Likely grant.lik...@secretlab.ca wrote: On Thu, Feb 18, 2010 at 5:10 AM, Ernst Schwab esch...@online.de wrote: [...] I have concerns on the use of the spin lock around the __spi_async() call, as Mike Frysinger already remarked. [...] I think the spinlock should always be held when calling the bus drivers -transfer() hook. That will also allow us to remove driver-specific locking from the bus drivers. Otherwise we've got an inconsistent locking scheme which is asking for trouble down the road. Hm, I do not agree to that. I think it is a bad idea to have a spinlock around a function call that may do a lot of things (we don't know about it in the spi.c module) and may block multiple CPU cores or disable interrupts for an indefinite time. The list_for_each_entry(xfer,...) loop may process many elements, during that time, all other cores are locked off without need. The implementation present in the kernel (without patches) apparently made the assumption, that master-transfer() itself can be called from process and interrupt context concurrently and does its own locking. This is true for the (few) drivers I looked it up. The list_for_each_entry(xfer, message-transfers, transfer_list) loop seems to be safe (although I'm not absolutely sure about it) without locking, since it only seems to access data stored in the message passed as a function argument. I looked at them all this morning. Without exception, every single driver follows this pattern: - [maybe check some stuff] - spin_lock_irqsave() - list_add_tail() - trigger workqueue or other way to kick off transfer - spin_unlock_irqrestore() Yet every driver implements its own transfer list and spin lock in the private data structure. That says to me that the spi layer is abstracting at the wrong level. I'm cautious about taking away the ability for drivers to make their own decisions about how to manage transfers, but it seems like the transfer queue and spin lock should be provided by the common code. The trick is how to do it while providing a migration path for drivers. One option is to deprecate the -transfer hook and allow drivers to provide a .request() hook to signal the driver that new transfers have been added to the list. Something like how the block layer works. That would keep the critical region in spi_async() small. Another option is to let drivers provide a .prepare() hook that validates and prepares a message before entering the critical region. The .prepare() option is probably the simplest to implement and has the lowest impact on existing device drivers to use the new functionality, but the .request() option might be better in the long run. With our additional spinlocking around the whole stuff we add significant additional locking time without a need. I suggest to remove the spinlocking in spi_async (see patch below). The spinlock is absolutely required. Otherwise the test of bus_lock_flag is racy. For example, in the mmc case, a non-mmc message could get queued after the mmc code obtains the bus lock. Consider two threads, A B. A is running mmc code. Consider the following sequence of events: A1: obtain mutex A2: obtain spinlock A3: set flag A4: release spinlock A5: call async_transfer_locked() A6: call async_transfer_locked() A7: call async_transfer_locked() A8: clear flag A9: release mutex B1: obtain spinlock B2: check flag; bail if set B3: call .transfer() hook B4: release spinlock In this case, there is no way for the B-3 transfer to occur between A-2 and A8. However, if B1 and B4 are omitted, you can get this sequence: A1: obtain mutex A2: obtain spinlock B2: check flag; bail if set A3: set flag A4: release spinlock A5: call async_transfer_locked() B3: call .transfer() hook A6: call async_transfer_locked() A7: call async_transfer_locked() A8: clear flag A9: release mutex See? Without the spinlock there is nothing preventing B3 from getting called in the midst of A's bus lock. But: This all is true for the original design, there seems to be no protection against accidental reentrant use of spi_async(). spi_async() is intended to be callable from atomic context so that IRQ handers can trigger a new transfer without waiting around for it to complete. It is also totally valid for both process and IRQ contexts to call spi_async() at the same time, which is exactly why the spinlock should be required. Seems the spinlock is already in master-transfer() (I missed that, because I checked the wrong function...), so it is valid that multiple contexts (process, IRQ) call master-transfer() concurrently. On Thu, Feb 25, 2010 at 3:57 AM, Ernst Schwab esch...@online.de wrote: Ernst Schwab esch...@online.de wrote: int spi_async(struct spi_device *spi, struct spi_message *message) { [...] + if (master-bus_lock_flag) + ret = -EBUSY; + else
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
On Thu, 25 Feb 2010 09:38:48 -0700 Grant Likely grant.lik...@secretlab.ca wrote: I looked at them all this morning. Without exception, every single driver follows this pattern: - [maybe check some stuff] - spin_lock_irqsave() - list_add_tail() - trigger workqueue or other way to kick off transfer - spin_unlock_irqrestore() Yet every driver implements its own transfer list and spin lock in the private data structure. That says to me that the spi layer is abstracting at the wrong level. I'm cautious about taking away the ability for drivers to make their own decisions about how to manage transfers, but it seems like the transfer queue and spin lock should be provided by the common code. The trick is how to do it while providing a migration path for drivers. One option is to deprecate the -transfer hook and allow drivers to provide a .request() hook to signal the driver that new transfers have been added to the list. Something like how the block layer works. That would keep the critical region in spi_async() small. Another option is to let drivers provide a .prepare() hook that validates and prepares a message before entering the critical region. The .prepare() option is probably the simplest to implement and has the lowest impact on existing device drivers to use the new functionality, but the .request() option might be better in the long run. I'd vote for the .request() hook. The spinlock is absolutely required. Otherwise the test of bus_lock_flag is racy. [...] See? Without the spinlock there is nothing preventing B3 from getting called in the midst of A's bus lock. Ok, now, I see; an interrupt being executed on a different core than the process context will manage it to malfunction if the spinlock is missing. A possible solution would be to add an extra queue, handled in the spi.c module, stored in the master's data structure, that stores all messages coming in while the bus lock is active. The queue can be flushed by calling __spi_async() for each message as soon as the bus lock is removed. That also works. In fact, this would work well with the .request() hook suggestion I made above because when the lock is removed, the delayed list could just be appended to the normal request list. A .prepare() hook would also be useful because the message can still be pre-checked for validity outside of the critical region before being added to the delay list. Maybe all of the list/queue handling should be done in spi.c, and (inline) functions in spi.h; then, the driver could call a 'spi_get_next_message()' function and will get a message out of one of the two queues: spi_sync_locked() and spi_async_locked() will add to 'direct' queue, and call .request() spi_sync() and spi_async() will add to 'delayed' queue if spi bus is locked will add to 'direct' queue if unlocked, and call .request() spi_bus_unlock() will append entire 'delayed' to 'direct' queue, and call .request() spi_get_next_message() will get next message from 'direct' queue If you want to attempt this, maintain it as a separate patch on top of the core bus locking patch. I will look at it if I find the time... Regards Ernst -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
On Thu, Feb 25, 2010 at 10:25 AM, Ernst Schwab esch...@online.de wrote: On Thu, 25 Feb 2010 09:38:48 -0700 Grant Likely grant.lik...@secretlab.ca wrote: I looked at them all this morning. Without exception, every single driver follows this pattern: - [maybe check some stuff] - spin_lock_irqsave() - list_add_tail() - trigger workqueue or other way to kick off transfer - spin_unlock_irqrestore() Yet every driver implements its own transfer list and spin lock in the private data structure. That says to me that the spi layer is abstracting at the wrong level. I'm cautious about taking away the ability for drivers to make their own decisions about how to manage transfers, but it seems like the transfer queue and spin lock should be provided by the common code. The trick is how to do it while providing a migration path for drivers. One option is to deprecate the -transfer hook and allow drivers to provide a .request() hook to signal the driver that new transfers have been added to the list. Something like how the block layer works. That would keep the critical region in spi_async() small. Another option is to let drivers provide a .prepare() hook that validates and prepares a message before entering the critical region. The .prepare() option is probably the simplest to implement and has the lowest impact on existing device drivers to use the new functionality, but the .request() option might be better in the long run. I'd vote for the .request() hook. It may even be advantageous to provide both .prepare and .request hooks. .prepare would allow messages to be checked out of line before touching any critical regions at all. The two hooks are pretty much mutually independent. The spinlock is absolutely required. Otherwise the test of bus_lock_flag is racy. [...] See? Without the spinlock there is nothing preventing B3 from getting called in the midst of A's bus lock. Ok, now, I see; an interrupt being executed on a different core than the process context will manage it to malfunction if the spinlock is missing. It is even broken on a single processor. If an IRQ occurs between B2 and B3, and the IRQ goes down an spi bus lock path, then things still get out of order. A possible solution would be to add an extra queue, handled in the spi.c module, stored in the master's data structure, that stores all messages coming in while the bus lock is active. The queue can be flushed by calling __spi_async() for each message as soon as the bus lock is removed. That also works. In fact, this would work well with the .request() hook suggestion I made above because when the lock is removed, the delayed list could just be appended to the normal request list. A .prepare() hook would also be useful because the message can still be pre-checked for validity outside of the critical region before being added to the delay list. Maybe all of the list/queue handling should be done in spi.c, and (inline) functions in spi.h; then, the driver could call a 'spi_get_next_message()' function and will get a message out of one of the two queues: Try not to get too exotic here. It should be sufficient for the driver to only look at the 'direct' queue and let the core code decide when to move stuff from delayed to direct. In fact, the existing list_del_init() callsites will probably work as-is with just changing the pointer to the list head. spi_sync_locked() and spi_async_locked() will add to 'direct' queue, and call .request() spi_sync() and spi_async() will add to 'delayed' queue if spi bus is locked will add to 'direct' queue if unlocked, and call .request() spi_bus_unlock() will append entire 'delayed' to 'direct' queue, and call .request() Sounds appropriate. However, I think you'll want to avoid having to change every bus driver all at once. I think this can be done without breaking drivers. If you leave the .transfer() hook in place and add the .request() hook, then drivers can provide one or the other. If a .transfer() hook is provided, then everything works fine. spi_async*() calls the transfer hook for each message received. If the bus is locked, then spi_async() adds them to the delayed queue without calling .transfer(). If transfer is NULL, then the setup code should set .transfer to a 'stock' transfer function that only adds the message to the direct queue. Drivers that migrate to this mode then take advantage of the nice short critical region provided by the stock .transfer() hook. In either case, spi_bus_unlock() will walk the delayed queue and call .transfer() on each message. If .request is set, then it should be called after each .transfer() call and also at the end of spi_bus_unlock(). Once all the drivers are migrated to the .request() API, then the stock transfer hook can be folded directly into __spi_async() , the hook removed from struct spi_master, and the spi_bus_unlock() path can be simplified
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
On 02/25/2010 11:38 AM, Grant Likely wrote: On Thu, Feb 25, 2010 at 3:36 AM, Ernst Schwab esch...@online.de wrote: On Wed, 24 Feb 2010 13:32:43 -0700 Grant Likely grant.lik...@secretlab.ca wrote: On Thu, Feb 18, 2010 at 5:10 AM, Ernst Schwab esch...@online.de wrote: [...] I have concerns on the use of the spin lock around the __spi_async() call, as Mike Frysinger already remarked. [...] I think the spinlock should always be held when calling the bus drivers -transfer() hook. That will also allow us to remove driver-specific locking from the bus drivers. Otherwise we've got an inconsistent locking scheme which is asking for trouble down the road. Hm, I do not agree to that. I think it is a bad idea to have a spinlock around a function call that may do a lot of things (we don't know about it in the spi.c module) and may block multiple CPU cores or disable interrupts for an indefinite time. The list_for_each_entry(xfer,...) loop may process many elements, during that time, all other cores are locked off without need. The implementation present in the kernel (without patches) apparently made the assumption, that master-transfer() itself can be called from process and interrupt context concurrently and does its own locking. This is true for the (few) drivers I looked it up. The list_for_each_entry(xfer, message-transfers, transfer_list) loop seems to be safe (although I'm not absolutely sure about it) without locking, since it only seems to access data stored in the message passed as a function argument. I looked at them all this morning. Without exception, every single driver follows this pattern: - [maybe check some stuff] - spin_lock_irqsave() - list_add_tail() - trigger workqueue or other way to kick off transfer - spin_unlock_irqrestore() I am only familiar with the px2xx_spi.c driver. This driver has a lock in -transfer() that is constructed as above. The lock protects internal data structures that are shared between -transfer(), a workqueue that pulls the next message from the queue, and a tasklet that prepares each transfer within a message for processing by the hardware and returns any completed messages. In addition to the list_add_tail() mentioned above, there is also private (perhaps could be public) message state information that gets set. As a side note: I noticed that the data being protected in -transfer() is only accessed in workqueue and tasklet context, never in interrupt context. Thus, in my (greatly extended) private version of pxa2xx_spi.c, I added a new lock to the private driver data and swapped the spinlock_irqsave for spinlock_bh in all places that protect the message queue and state, so that there would be less (however small) impact on hardware interrupts. Yet every driver implements its own transfer list and spin lock in the private data structure. That says to me that the spi layer is abstracting at the wrong level. I'm cautious about taking away the ability for drivers to make their own decisions about how to manage transfers, but it seems like the transfer queue and spin lock should be provided by the common code. The trick is how to do it while providing a migration path for drivers. I don't see how migrating the queue and locking to the core saves a lot. You still need a queue per controller driver. It would provide standardization of the function, but the function is not complicated, as you point out above. Most of the manipulation of the queue still exists inside the driver. The existing spi core design allows drivers to add whatever other state information they need within the same lock. I'm not sure if every driver's private state is standard enough that it could become public. If not, it is important (helpful, required?) that the state be updated atomically with the queue entry. One option is to deprecate the -transfer hook and allow drivers to provide a .request() hook to signal the driver that new transfers have been added to the list. Something like how the block layer works. That would keep the critical region in spi_async() small. Another option is to let drivers provide a .prepare() hook that validates and prepares a message before entering the critical region. The .prepare() option is probably the simplest to implement and has the lowest impact on existing device drivers to use the new functionality, but the .request() option might be better in the long run. You definitely need something like .prepare, if this change is made. As David Brownell has pointed out in the past, the driver is responsible for rejecting any message that it cannot implement, preferably at the time -transfer() is called. This principal is not universally implemented in all drivers (pxa2xx_spi rejects some messages at the time it tries to execute a transfer, rather than checking the whole message when queued, as it should). Reasons for rejection include unimplementable: clock speed, length, bits/word, etc. Only
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
Hey Ernst. Good patch, thank you. The merge window is going to open any moment now, and this patch hasn't had any linux-next exposure, so I'm not going to pick it up for 2.6.34. Instead, I'll wait until after the merge window closes and add it to my next branch so that it gets lots of exposure before the 2.6.35 merge window. Since this is a pretty core change, I'd like to have as much feedback and testing on it as possible before I commit to pushing it into mainline. Anyone who is following along and is interested, please test these two patches and let me know if they work for you. Ping me after Linus tags v2.6.34-rc1 to make sure I don't forget to put it in my -next branch. Some comments below... On Thu, Feb 18, 2010 at 5:10 AM, Ernst Schwab esch...@online.de wrote: I integrated your changes in V2, I hope I did not miss anything. On Wed, 17 Feb 2010 13:03:04 -0700 Grant Likely grant.lik...@secretlab.ca wrote: extern int spi_setup(struct spi_device *spi); extern int spi_async(struct spi_device *spi, struct spi_message *message); +extern int +spi_async_locked(struct spi_device *spi, struct spi_message *message); Please keep return parameters and annotations on the same line as the function name (makes grep results more useful). You can break lines between parameters. Hm, the prototype for spi_alloc_master in the same file let me assume, that this should be the style used in this header file. Normally, I wouldn't do a linebreak there, ever... Yeah, things are inconsistent, but in this case both styles are present in the file, and when grepping I find it more important to have the annotations than the parameters immediately visable. Something more important: I have concerns on the use of the spin lock around the __spi_async() call, as Mike Frysinger already remarked. Why do we 'need' this spin lock: - by definition, spi_async can be called from any context * Context: any (irqs may be blocked, etc) * * This call may be used in_irq and other contexts which can't sleep, * as well as from task contexts which can sleep. - we are not sure that spi_async isn't used reentrant (e.g. interrupt service routine and normal context - __spi_async calls the master-transfer() function which is not required to be reentrant (or is it?) Um, I'm not sure I follow the argument. spin locks are safe to obtain in any context. Mutexes are the ones that cannot be held at atomic context. The spin lock does more than just protect the bus_is_locked flag. It also prevent overlap between process and IRQ context running in the relevant code. Basically the spin_lock_irqsave() either defers all IRQ processing in uniprocessor, or has other processors spin in SMP. I think the spinlock should always be held when calling the bus drivers -transfer() hook. That will also allow us to remove driver-specific locking from the bus drivers. Otherwise we've got an inconsistent locking scheme which is asking for trouble down the road. But: This all is true for the original design, there seems to be no protection against accidental reentrant use of spi_async(). spi_async() is intended to be callable from atomic context so that IRQ handers can trigger a new transfer without waiting around for it to complete. It is also totally valid for both process and IRQ contexts to call spi_async() at the same time, which is exactly why the spinlock should be required. Hmmm. Are you concerned about a code path where an spi_async() call will call the bus driver, which will end up calling spi_async() on itself? I cannot think of a use case where that would be desirable. g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
I integrated your changes in V2, I hope I did not miss anything. On Wed, 17 Feb 2010 13:03:04 -0700 Grant Likely grant.lik...@secretlab.ca wrote: [...] spi_sync() and spi_sync_locked() are largely idenical. Only difference is that spi_sync() needs to obtain the mutex. Rename spi_sync() to __spi_sync() and add a 'get_lock' flag parameter. Then make spi_sync() and spi_sync_locked() call __spi_sync() with the flag set appropriately. I made the parameter 'bus_locked' instead of 'get_lock' because we do not only get the lock but release it after use. [...] extern int spi_setup(struct spi_device *spi); extern int spi_async(struct spi_device *spi, struct spi_message *message); +extern int +spi_async_locked(struct spi_device *spi, struct spi_message *message); Please keep return parameters and annotations on the same line as the function name (makes grep results more useful). You can break lines between parameters. Hm, the prototype for spi_alloc_master in the same file let me assume, that this should be the style used in this header file. Normally, I wouldn't do a linebreak there, ever... Something more important: I have concerns on the use of the spin lock around the __spi_async() call, as Mike Frysinger already remarked. Why do we 'need' this spin lock: - by definition, spi_async can be called from any context * Context: any (irqs may be blocked, etc) * * This call may be used in_irq and other contexts which can't sleep, * as well as from task contexts which can sleep. - we are not sure that spi_async isn't used reentrant (e.g. interrupt service routine and normal context - __spi_async calls the master-transfer() function which is not required to be reentrant (or is it?) But: This all is true for the original design, there seems to be no protection against accidental reentrant use of spi_async(). Maybe we should check if there is any current call of spi_async() from context that cannot sleep, and change the definition where it is allowed to be called, if possible. Regards Ernst -- Download Intelreg; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH 1/2 V2] spi/mmc_spi: SPI bus locking API, using mutex
Followup patch to correct missing return value. Signed-off-by: Ernst Schwab esch...@online.de --- Apparently I missed to look at the compiler warnings, sorry for that. diff -upr a/drivers/spi/spi.c b/drivers/spi/spi.c --- a/drivers/spi/spi.c +++ a/drivers/spi/spi.c @@ -852,7 +852,7 @@ static int __spi_sync(struct spi_device */ int spi_sync(struct spi_device *spi, struct spi_message *message) { - __spi_sync(spi, message, 0); + return __spi_sync(spi, message, 0); } EXPORT_SYMBOL_GPL(spi_sync); @@ -874,7 +874,7 @@ EXPORT_SYMBOL_GPL(spi_sync); */ int spi_sync_locked(struct spi_device *spi, struct spi_message *message) { - __spi_sync(spi, message, 1); + return __spi_sync(spi, message, 1); } EXPORT_SYMBOL_GPL(spi_sync_locked); -- Download Intelreg; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general