On Thu, Jul 20, 2017 at 5:34 AM, Eric W. Biederman <ebied...@xmission.com> wrote: > Ingo Molnar <mi...@kernel.org> writes: > >> * Andrew Morton <a...@linux-foundation.org> wrote: >> >>> On Wed, 19 Jul 2017 15:54:27 -0700 Davidlohr Bueso <d...@stgolabs.net> >>> wrote: >>> >>> > On Wed, 19 Jul 2017, Andrew Morton wrote: >>> > >>> > >I do rather dislike these conversions from the point of view of >>> > >performance overhead and general code bloat. But I seem to have lost >>> > >that struggle and I don't think any of these are fastpath(?). >>> > >>> > Well, since we now have fd25d19 (locking/refcount: Create unchecked >>> > atomic_t >>> > implementation), performance is supposed to be ok. >>> >>> Sure, things are OK for people who disable the feature. >> >> So with the WIP fast-refcount series from Kees: >> >> [PATCH v6 0/2] x86: Implement fast refcount overflow protection >> >> I believe the robustness difference between optimized-refcount_t and >> full-refcount_t will be marginal. >> >> I.e. we'll be able to have both higher API safety _and_ performance. >> >>> But for people who want to enable the feature we really should minimize the >>> cost >>> by avoiding blindly converting sites which simply don't need it: simple, >>> safe, >>> old, well-tested code. Why go and slow down such code? Need to apply some >>> common sense here... >> >> It's old, well-tested code _for existing, sane parameters_, until someone >> finds a >> decade old bug in one of these with an insane parameters no-one stumbled >> upon so >> far, and builds an exploit on top of it. >> >> Only by touching all these places do we have a chance to improve things >> measurably >> in terms of reducing the probability of bugs. > > The more I hear people pushing the upsides of refcount_t without > considering the downsides the more I dislike it. > > - refcount_t is really the wrong thing because it uses saturation > semantics. So by definition it includes a bug.
This is a feature, not a bug. :) If the kernel has a refcount overflow flaw (which, in the pantheon of exploitable kernel bugs, is _common_[1], as I've referenced earlier), then we're downgrading an exploitable use-after-free to a harmless memory allocation leak. Even if you don't include malicious attackers in the consideration, this changes a memory corruption of unknown results into a memory leak. That's actually an _improvement_ to availability and integrity. > - refcount_t will only really prevent something if there is an extra > increment. That is not the kind of bug people are likely to make. Like I've said, this is common. This is usually a mistake in error handling which forgets (or misplaces) a "put". > - refcount_t won't help if you have an extra decrement. The bad > use-after-free will still happen. Yes, and not having a protected refcount_t will also allow a use-after-free. There is no change here, so it's not a "downside" of refcount_t. In fact, having gained the implicit annotation of refcount_t being a refcounter (rather than a simple atomic_t) means that auditing users is easier and more focused. This could reduce the chance people make mistakes in the first place, especially since the API is more constrained than atomic_t. > - refcount_t won't help if there is a memory stomp. As with an extra > decrement the bad use-after-free will still happen. A stomp of the refcount_t value itself? Sure, and this remains as vulnerable as atomic_t. This isn't a downside to refcount_t. And again, since there _is_ checking of the value in places, it's possible an actionable warning will be produced (though, yes, after the use-after-free has been exposed), which is a benefit over simple atomic_t. I mention this in the commit log ("better to maybe produce the warning than be universally silent"). > So all I see is a huge amount of code churn to implement a buggy (by > definition) refcounting API, that risks adding new bugs and only truly > helps with bugs that are unlikely in the first place. Given that the conversions alone have been uncovering refcount bugs and that the implementation isn't "buggy" (it provides a specific set of protections), I strongly disagree with your assessment. > I really don't think this is an obvious slam dunk. It entirely blocks a commonly exploitable flaw in the kernel. This isn't a probabilistic mitigation, either. While I'm not sure I'd ever describe a security protection as a slam dunk, I think this is up there. :) -Kees [1] When I say "common", I'm speaking from the perspective of security flaw frequency. The kernel sees about 1-2 high severity security flaws a year (with an average lifetime of 5 years), and the refcount-overflow use-after-free class of flaw is normally reliable for attackers (and I'd classify as high severity). With 2016 seeing two known separate refcount-overflow use-after-free flaws, this could be better described as an epidemic, but I'll try to be less inflammatory and just say "common". -- Kees Cook Pixel Security