On Monday 10 December 2012 11:52:18 Hans J. Koch wrote:
> On Mon, Dec 10, 2012 at 11:03:59AM +0200, Vitalii Demianets wrote:
> > On Saturday 08 December 2012 01:47:21 Hans J. Koch wrote:
> > > On Fri, Dec 07, 2012 at 05:00:54PM +0200, Vitalii Demianets wrote:
> > > > > On second thought, we can't call enable_irq()/disable_irq()
> > > > > unconditionally because of the potential disable counter
> > > > > (irq_desc->depth) disbalance. That's why we need UIO_IRQ_DISABLED 
flag,
> > > > > and that's why we should check it in uio_pdrv_genirq_irqcontrol().
> > > > > On the other hand, you are right in that we don't need to check it
> > > > > inside irq handler. Inside irq handler we can disable irq and set 
the
> > > > > flag unconditionally, because:
> > > > > a) We know for sure that irqs are enabled, because we are inside
> > > > > (not-shared) irq handler;
> > > > >  and
> > > > > b) We are guarded from potential race conditions by 
spin_lock_irqsave()
> > > > > call in uio_pdrv_genirq_irqcontrol().
> > > > >
> > > > > So,yes, we can get rid of costly atomic call to
> > > > > test_and_set_bit(UIO_IRQ_DISABLED,..) inside irq handler. But I 
still
> > > > > don't like the idea of mixing this optimization with bug fixes in a
> > > > > single patch.
> > > >
> > > > On the third thought, we can't ;)
> > > > Imagine the SMP system where uio_pdrv_genirq_irqcontrol() is being
> > > > executed on CPU0 and irq handler is running concurrently on CPU1. To
> > > > protect from disable_irq counter disbalance we must first check 
current
> > > > irq status, and in atomic manner. Thus we prevent  double-disable, one
> > > > from
> > > > uio_pdrv_genirq_irqcontrol() running on CPU0 and another form irq 
handler
> > > > running on CPU1.
> > > > Above consideration justifies current code.
> > > >
> > > > But it seems we have potential concurrency problem here anyway. Here 
is
> > > > theoretical scenario:
> > > > 1) Initial state: irq is enabled, uio_pdrv_genirq_irqcontrol() starts
> > > > being executed on CPU0 with irq_on=1 and at the same time, 
concurrently,
> > > > irq handler starts being executed on CPU1;
> > > > 2) irq handler executes line
> > > >    if (!test_and_set_bit(UIO_IRQ_DISABLED, &priv->flags))
> > > > as irq was enabled, the condition holds. And now UIO_IRQ_DISABLED is 
set.
> > > > 3) uio_pdrv_genirq_irqcontrol() executes line
> > > >    if (test_and_clear_bit(UIO_IRQ_DISABLED, &priv->flags))
> > > > as UIO_IRQ_DISABLED was set by CPU1, the condition holds. And now
> > > > UIO_IRQ_DISABLED is cleared.
> > > > 4) CPU0 executes enable_irq() . IRQ is enabled. "Unbalanced enable for
> > > > IRQ %d\n" warning is issued.
> > > > 5) irq handler on CPU1 executes disable_irq_nosync(). IRQ is disabled.
> > > >
> > > > And now we are in situation where IRQ is disabled, but 
UIO_IRQ_DISABLED
> > > > is cleared. Bad.
> > > >
> > > > The above scenario is purely theoretical, I have no means to check it 
in
> > > > hardware.
> > > > The (theoretical) solution is to guard UIO_IRQ_DISABLED test and 
actual
> > > > irq disabling inside irq handler by priv->lock.
> > > >
> > > > What do you think about it? Is it worth worrying about?
> > >
> > > Hi Vitalii,
> > > thanks a lot for analyzing the problem so thoroughly. It made me review
> > > uio_pdrv_genirq.c again, and I noticed several issues and came to the
> > > following conclusions:
> > >
> > > 1.) priv->lock is completely unnecessary. It is only used in one 
function,
> > > so there's nothing it could possibly protect.
> > >
> > > 2.) All these "test_and_clear_bit" and "test_and_set_bit" calls are also
> > > unnecessary. We can simply use enable_irq and disable_irq in both the
> > > irq handler and in uio_pdrv_genirq_irqcontrol.
> > >
> > > We should go "back to the roots" and have a look at how UIO works.
> > > The workflow it is intended for is like this:
> > >
> > > 1.) Hardware is in Reset State (e.g. after boot). Any decent hardware
> > > has its interrupt disabled at that time.
> > >
> > > 2.) uio_pdrv_genirq is loaded. Kernel enables the irq.
> > >
> > > 3.) Userspace part of the driver comes up. It will initialize the 
hardware
> > > (including setting the bits that enable the interrupt).
> > >
> > > 4.) Userspace will then issue a blocking read() on /dev/uioX. Typically,
> > > there'll be a loop or a thread with this blocking read() at the 
beginning
> > > (usually using the select() call).
> > >
> > > 5.) At some time, a hardware interrupt will occur. The irq handler in
> > > kernel space will be called, only to disable the irq. This will also 
cause
> > > the UIO core to make /dev/uioX readable.
> > >
> > > 6.) Userspace's blocking read returns. Userspace does its work by
> > > reading/writing device memory.
> > >
> > > 7.) As the last thing, Userspace writes a "1" to /dev/uioX, which causes
> > > uio_pdrv_genirq_irqcontrol to be called, re-enabling the interrupt.
> > >
> > > 8.) Goto 4.)
> > >
> > > We should also remember that uio_pdrv_genirq_handler() is NOT a real
> > > hardware irq handler. The real handler is in the UIO core, which will
> > > increment the event number and wake up userspace.
> > >
> > > So, although your scenario clearly shows a subtle race condition, there 
is
> > > none. If userspace does stupid things, no harm will be done to the 
kernel.
> > 
> > With all due respect, I disagree here. If userspace does stupid things, 
> > unintentionally because of poorly written code or intentionally because of 
a 
> > malicious purpose, we could have irq depth counter disbalanced which, in 
> > turn, could lead to the interrupt not enabled when it should be.
> 
> It's just the interrupt of a broken UIO userspace driver. Note that this is
> also a security mechanism, disabling the irq when it is not handled 
properly.
> 
> > In which 
> > case the whole system would stop doing useful things which it was designed 
to 
> > do.
> 
> If the kernel irq depth counter can't track the irq, I don't really see a
> point in fixing it by adding extra flags in a simple driver.
> 

The flag in driver would be needed if we supposed that userspace can do stupid 
things. For example, userspace can call write() twice with zero data. Which 
would lead to disabling irq twice. Which means that someone would need to 
enable irq twice. Which is not in the userspace-kernel interaction scenario 
you so clearly described above previously.

Also, without that flag the irq could be disabled twice even when userspace do 
not do double-disable. It could happen when userspace is writing to the 
device file AND irq is running at this same moment on another CPU core 
concurrently.
Yes, you are right that such a situation could be avoided if userspace code 
was written carefully. But how can you assure that every user will write good 
code?

> > 
> > > If userspace is designed the way described above (and in the
> > > documentation), it will always wake up with its interrupt disabled, do 
its
> > > work, and then re-enable the interrupt. You can probably think of a few
> > > things userspace could do to screw things up. But that's not our 
problem.
> > >
> > 
> > Disagree again. I think we can not leave a hole in kernel which opens DOS 
> > possibilities and hope userspace will behave well. There are always errors 
in 
> > the code. And security issues, too.
> 
> UIO is a risky thing to begin with. There are many more possibilities to
> do bad things. That's why UIO devices are root-only.
> 

We can assume that code running under root does not do bad things on purpose. 
But that doesn't mean that code running under root is bug-free.

> > 
> > > Could you hack up a patch for that? I think it should start with 
removing
> > > uio_pdrv_genirq_platdata->lock and uio_pdrv_genirq_platdata->flags...
> > >
> > > Thanks again for your work. What do you think about my theory?
> > >
> > 
> > I think that we should fix the bug, not ignore it.
> 
> I don't see a bug here.
> 

There is no bug as long as you trust userspace code.
Personally I don't trust it. I don't trust even my own code. There were plenty 
of times when I looked at my code ant thought: "Oh gosh, how could I be so 
dumb!"
And kernel should behave properly no matter what stupid things userspace does, 
shouldn't it? Isn't it one of the reasons why we have distinction between 
kernel and user space at all? Because kernel is by definition trusted code, 
and userspace is not.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to