Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000

liu ping fan Sun, 28 Oct 2012 22:24:58 -0700

On Fri, Oct 26, 2012 at 2:48 AM, Jan Kiszka <jan.kis...@siemens.com> wrote:
> On 2012-10-25 19:02, Avi Kivity wrote:
>> On 10/25/2012 06:39 PM, Jan Kiszka wrote:
>>>>
>>>> That doesn't work cross-thread.
>>>>
>>>> vcpu A: write to device X, dma-ing to device Y
>>>> vcpu B: write to device Y, dma-ing to device X
>>>
>>> We will deny DMA-ing from device X on behalf of a VCPU, ie. in dispatch
>>> context, to Y.
>>>
>>> What we do not deny, though, is DMA-ing from an I/O thread that
>>> processes an event for device X.
>>
>> I would really like to avoid depending on the context.  In real hardware, 
>> there is no such thing.
>
> The point is how we deal with any kind of access to a device that
> requires taking that device's lock while holding another lock, provided
> that scenario can also take place in reverse order at the same time.
> Known scenarios are:
>
>  - vcpu 1 -> access device A -> access device B
>  - vcpu 2 -> access device B -> access device A
>
>  - event 1 -> device A event processing -> access device B
>  - event 2 -> device B event processing -> access device A
>
> and combinations of those pairs.
>
>>
>>> If the invoked callback of device X
>>> holds the device lock across some DMA request to Y, then we risk to run
>>> into the same ABBA issue. Hmm...
>>
>> Yup.
>>
>>>
>>>>
>>>> My suggestion was to drop the locks around DMA, then re-acquire the lock
>>>> and re-validate data.
>>>
>>> Maybe possible, but hairy depending on the device model.
>>
>> It's unpleasant, yes.
>>
>> Note depending on the device, we may not need to re-validate data, it may be 
>> sufficient to load it into local variables to we know it is consistent at 
>> some point.  But all those solutions suffer from requiring device model 
>> authors to understand all those issues, rather than just add a simple lock 
>> around access to their data structures.
>
> Right. And therefor it is a suboptimal way to start (patching).
>
>>
>>>>>>> I see that we have a all-or-nothing problem here: to address this
>>>>>>> properly, we need to convert the IRQ path to lock-less (or at least
>>>>>>> compatible with holding per-device locks) as well.
>>>>>>
>>>>>> There is a transitional path where writing to a register that can cause
>>>>>> IRQ changes takes both the big lock and the local lock.
>>>>>>
>>>>>> Eventually, though, of course all inner subsystems must be threaded for
>>>>>> this work to have value.
>>>>>>
>>>>>
>>>>> But that transitional path must not introduce regressions. Opening a
>>>>> race window between IRQ cause update and event injection is such a
>>>>> thing, just like dropping concurrent requests on the floor.
>>>>
>>>> Can you explain the race?
>>>
>>> Context A                            Context B
>>>
>>> device.lock
>>> ...
>>> device.set interrupt_cause = 0
>>> lower_irq = true
>>> ...
>>> device.unlock
>>>                                      device.lock
>>>                                      ...
>>>                                      device.interrupt_cause = 42
>>>                                      raise_irq = true
>>>                                      ...
>>>                                      device.unlock
>>>                                      if (raise_irq)
>>>                                              bql.lock
>>>                                              set_irq(device.irqno)
>>>                                              bql.unlock
>>> if (lower_irq)
>>>      bql.lock
>>>      clear_irq(device.irqno)
>>>      bql.unlock
>>>
>>>
>>> And there it goes, our interrupt event.
>>
>> Obviously you'll need to reacquire the device lock after taking bql and 
>> revalidate stuff.  But that is not what I am suggesting.  Instead, any path 
>> that can lead to an irq update (or timer update etc) will take both the bql 
>> and the device lock.  This will leave after the first pass only side effect 
>> free register reads and writes, which is silly if we keep it that way, but 
>> we will follow with a threaded timer and irq subsystem and we'll peel away 
>> those big locks.
>>
>>   device_mmio_write:
>>     if register is involved in irq or timers or block layer or really 
>> anything that matters:
>>       bql.acquire
>>     device.lock.acquire
>>     do stuff
>>     device.lock.release
>>     if that big condition from above was true:
>>       bql.release
>
> Looks simpler than it is as you cannot wrap complete handlers with that
> pattern. An example where it would fail (until we solved the locking
> issues above):
>
> mmio_write:
>   bql.conditional_lock
>   device.lock
>   device.check_state
>   issue_dma
>   device.update_state
>   update_irq, play_with_timers, etc.
>   device.unlock
>   bql.conditional_unlock
>
> If that DMA request hits an unconverted MMIO region or one that takes
> BQL conditionally as above, we will lock up (or bail out as our mutexes
> detect the error). E1000's start_xmit looks like this so far, and that's
> a pretty import service.
>
> Moreover, I prefer having a representative cut-through over enjoying to
> merge a first step that excludes some 80% of the problems. For that
> reason I would be even be inclined to start with addressing the IRQ
> injection topic first (patch-wise), then the other necessary backend
> services for the e1000 or whatever and convert some device(s) last.
>
> IOW: cut out anything from this series that touches e1000 until the
> building blocks for converting it reasonably are finished. Carrying
> experimental, partially broken conversion on top is fine, try to merge
> pieces of that not, IMHO.
>
Agreed.  Just want to take this opportunity to discuss what is the
next and what is nut.


Regards,
pingfan

> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [patch v4 12/16] e1000: apply fine lock on e1000

Reply via email to