Re: [git pull] drm fixes for 6.6-rc1
On Thu, 7 Sept 2023 at 19:45, Dave Airlie wrote: > > Just a poke about the outstanding drm CI support pull request since I > haven't see any movement on that in the week, hopefully it's not as > difficult a problem as bcachefs :-) I was assuming that it wouldn't interfere with anything else... and that I could just ignore it until I have all my "real" pulls done. I didn't want to even look at it until I was "done". Yes, it's past the mid-way point of the second week of the merge window, and I mostly _should_ be "done". But I still keep finding new pull requests in my inbox that aren't just fixes and updates for previous main pull requests. Linus
Re: [git pull] drm CI integration
On Wed, 30 Aug 2023 at 18:00, Dave Airlie wrote: > > This is a PR to add drm-ci support files to the upstream tree. So I finally had no other pull requests pending, and spent some time looking at this, and I see nothing offensive. I did wonder how this then expands to having more than one subsystem using this (and mixing them possibly on the same CI system), but that's more about my ignorance about how the gitlab CI works than anything else, so that's certainly not a real concern. The other side of that "I do wonder" coin is for when others want to use the same tests but using some other CI infrastructure, whether it's some AWS or google cloud thing, or github or whatever. Anyway, considering that both of my idle curiosity reactions were about "if this is successful", I think me having those questions only means that I should pull this, rather than questioning the pull itself. If it works out so well that others want to actually do this and integrate our other selftests in similar manners, I think that would be lovely. And if - as you say - this is a failure and the whole thing gets deleted a year from now as "this didn't go anywhere", it doesn't look like it should cause a ton of problems either. Anyway, it's in my tree now, let's see where it goes. Linus
Re: [PATCH v5 0/9] Improve the copy of task comm
On Sun, 4 Aug 2024 at 00:56, Yafang Shao wrote: > > There is a BUILD_BUG_ON() inside get_task_comm(), so when you use > get_task_comm(), it implies that the BUILD_BUG_ON() is necessary. Let's just remove that silly BUILD_BUG_ON(). I don't think it adds any value, and honestly, it really only makes this patch-series uglier when reasonable uses suddenly pointlessly need that double-underscore version.. So let's aim at (a) documenting that the last byte in 'tsk->comm{}' is always guaranteed to be NUL, so that the thing can always just be treated as a string. Yes, it may change under us, but as long as we know there is always a stable NUL there *somewhere*, we really really don't care. (b) removing __get_task_comm() entirely, and replacing it with a plain 'str*cpy*()' functions The whole (a) thing is a requirement anyway, since the *bulk* of tsk->comm really just seems to be various '%s' things in printk strings etc. And once we just admit that we can use the string functions, all the get_task_comm() stuff is just unnecessary. And yes, some people may want to use the strscpy_pad() function because they want to fill the whole destination buffer. But that's entirely about the *destination* use, not the tsk->comm[] source, so it has nothing to do with any kind of "get_task_comm()" logic, and it was always wrong to care about the buffer sizes magically matching. Hmm? Linus
Re: [PATCH v5 0/9] Improve the copy of task comm
On Mon, 5 Aug 2024 at 20:01, Yafang Shao wrote: > > One concern about removing the BUILD_BUG_ON() is that if we extend > TASK_COMM_LEN to a larger size, such as 24, the caller with a > hardcoded 16-byte buffer may overflow. No, not at all. Because get_task_comm() - and the replacements - would never use TASK_COMM_LEN. They'd use the size of the *destination*. That's what the code already does: #define get_task_comm(buf, tsk) ({ \ ... __get_task_comm(buf, sizeof(buf), tsk); \ note how it uses "sizeof(buf)". Now, it might be a good idea to also verify that 'buf' is an actual array, and that this code doesn't do some silly "sizeof(ptr)" thing. We do have a helper for that, so we could do something like #define get_task_comm(buf, tsk) \ strscpy_pad(buf, __must_be_array(buf)+sizeof(buf), (tsk)->comm) as a helper macro for this all. (Although I'm not convinced we generally want the "_pad()" version, but whatever). Linus
Re: [PATCH v7 5/8] mm/util: Fix possible race condition in kstrdup()
On Sat, 17 Aug 2024 at 01:48, Alejandro Colomar wrote: > > I would compact the above to: > > len = strlen(s); > buf = kmalloc_track_caller(len + 1, gfp); > if (buf) > strcpy(mempcpy(buf, s, len), ""); No, we're not doing this kind of horror. If _FORTIFY_SOURCE has problems with a simple "memcpy and add NUL", then _FORTIFY_SOURCE needs to be fixed. We don't replace a "buf[len] = 0" with strcpy(,""). Yes, compilers may simplify it, but dammit, it's an unreadable incomprehensible mess to humans, and humans still matter a LOT more. Linus
Re: [PATCH 8/9] drm/xe/tests: Fix printf format specifiers in xe_migrate test
On Wed, 21 Feb 2024 at 21:05, Lucas De Marchi wrote: > > this has a potential to cause conflicts with upcoming work, so I think > it's better to apply this through drm-xe-next. Let me know if you agree. I disagree. Violently. For this to be fixed, we need to have the printf format checking enabled. And we can't enable it until all the problems have been fixed. Which means that we should *not* have to wait for [N] different trees to fix their issues separately. This should get fixed in the Kunit tree, so that the Kunit tree can just send a pull request to me to enable format checking for the KUnit tests, together with all the fixes. Trying to spread those fixes out to different git branches will only result in pain and pointless dependencies between different trees. Honestly, the reason I noticed the problem in the first place was that the drm tree had a separate bug, that had been apparently noted in linux-next, and *despite* that it made it into a pull request to me and caused new build failures in rc5. So as things are, I am not IN THE LEAST interested in some kind of "let us fix this in the drm tree separately" garbage. We're not making things worse by trying to fix this in different trees. We're fixing this in the Kunit tree, and I do not want to get *more* problems from the drm side. I've had enough. Linus
Re: [PATCH 2/9] lib/cmdline: Fix an invalid format specifier in an assertion msg
On Thu, 22 Feb 2024 at 09:36, Daniel Latypov wrote: > > Copying the line for context, it's about `p-r` where > p = memchr_inv(&r[1], 0, sizeof(r) - sizeof(r[0])); > `p-r` should never be negative unless something has gone horribly > horribly wrong. Sure it would - if 'p' is NULL. Of course, then a negative value wouldn't be helpful either, and in this case that's what the EXPECT_PTR_EQ checking is testing in the first place, so it's a non-issue. IOW, in practice clearly the sign should simply not matter here. I do think that the default case for pointer differences should be that they are signed, because they *can* be. Just because of that "default case", unless there's some actual reason to use '%tu', I think '%td' should be seen as the normal case to use. That said, just as a quick aside: be careful with pointer differences in the kernel. For this particular case, when we're talking about just 'char *', it's not a big deal, but we've had code where people didn't think about what it means to do a pointer difference in C, and how it can be often unnecessarily expensive due to the implied "divide by the size of the pointed object". Sometimes it's actually worth writing the code in ways that avoids pointer differences entirely (which might involve passing around indexes instead of pointers). Linus
Re: [PATCH next v2 08/11] minmax: Add min_const() and max_const()
On Sun, 25 Feb 2024 at 08:53, David Laight wrote: > > The expansions of min() and max() contain statement expressions so are > not valid for static intialisers. > min_const() and max_const() are expressions so can be used for static > initialisers. I hate the name. Naming shouldn't be about an implementation detail, particularly not an esoteric one like the "C constant expression" rule. That can be useful for some internal helper functions or macros, but not for something that random people are supposed to USE. Telling some random developer that inside an array size declaration or a static initializer you need to use "max_const()" because it needs to syntactically be a constant expression, and our regular "max()" function isn't that, is just *horrid*. No, please just use the traditional C model of just using ALL CAPS for macro names that don't act like a function. Yes, yes, that may end up requiring getting rid of some current users of #define MIN(a,b) ((a)<(b) ? (a):(b)) but dammit, we don't actually have _that_ many of them, and why should we have random drivers doing that anyway? Linus
Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
On Thu, 29 Feb 2024 at 01:23, Nikolai Kondrashov wrote: > > However, I think a better approach would be *not* to add the .gitlab-ci.yaml > file in the root of the source tree, but instead change the very same repo > setting to point to a particular entry YAML, *inside* the repo (somewhere > under "ci" directory) instead. I really don't want some kind of top-level CI for the base kernel project. We already have the situation that the drm people have their own ci model. II'm ok with that, partly because then at least the maintainers of that subsystem can agree on the rules for that one subsystem. I'm not at all interested in having something that people will then either fight about, or - more likely - ignore, at the top level because there isn't some global agreement about what the rules are. For example, even just running checkpatch is often a stylistic thing, and not everybody agrees about all the checkpatch warnings. I would suggest the CI project be separate from the kernel. And having that slack channel that is restricted to particular companies is just another sign of this whole disease. If you want to make a google/microsoft project to do kernel CI, then more power to you, but don't expect it to be some kind of agreed-upon kernel project when it's a closed system. Linus
Re: [PATCH 1/3] kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
On Fri, 1 Mar 2024 at 02:27, Nikolai Kondrashov wrote: > > I agree, it's hard to imagine even a simple majority agreeing on how GitLab CI > should be done. Still, we would like to help people, who are interested in > this kind of thing, to set it up. How about we reframe this contribution as a > sort of template, or a reference for people to start their setup with, > assuming that most maintainers would want to tweak it? We would also be glad > to stand by for questions and help, as people try to use it. Ack. I think seeing it as a library for various gitlab CI models would be a lot more palatable. Particularly if you can then show that yes, it is also relevant to our currently existing drm case. So I'm not objecting to having (for example) some kind of CI helper templates - I think a logical place would be in tools/ci/ which is kind of alongside our tools/testing subdirectory. (And then perhaps have a 'gitlab' directory under that. I'm not sure whether - and how much - commonality there might be between the different CI models of different hosts). Just to clarify: when I say "a logical place", I very much want to emphasize the "a" - maybe there are better places, and I'm not saying that is the only possible place. But it sounds more logical to me than some. Linus
Re: [git pull] drm for 6.9-rc1
On Tue, 12 Mar 2024 at 21:07, Dave Airlie wrote: > > I've done a trial merge into your tree from a few hours ago, there > are definitely some slighty messy conflicts, I've pushed a sample > branch here: I appreciate your sample merges since I like verifying my end result, but I think your merge is wrong. I got two differences when I did the merge. The one in intel_dp_detect() I think is just syntactic - I ended up placing the if (!intel_dp_is_edp(intel_dp)) intel_psr_init_dpcd(intel_dp); differently than you did (I did it *after* the tunnel_detect()). I don't _think,_ that placement matters, but somebody more familiar with the code should check it out. Added Animesh and Jani to the participants. But I think your merge gets the TP_printk() for the xe_bo_move trace event is actively wrong. You don't have the destination for the move in the printk. Or maybe I got it wrong. Our merges end up _close_, but not identical. Linus
Re: Linux 6.10-rc1
On Fri, 14 Jun 2024 at 02:02, Pavel Machek wrote: > > If I can get at least basic metric on the gpu (%idle? which process > use how much time?), it might be feasible. Is there tool similar for > top? Let's bring in the actual gpu people.. Dave/Jani/others - does any of this sound familiar? Pavel says things have gotten much slower in 6.10: "something was very wrong with the performance, likely to do with graphics" To bisect it, he'd need some way to judge it reasonably well and without too much of a bias. See https://lore.kernel.org/all/zmrtzozoi0t%2ft...@duo.ucw.cz/ for the original report. Thinkpad X220 - which I assume means old intel integrated GPU - at least one listing I found for that thing is i5-2430M, with "Intel® HD Graphics 3000". Linus
Re: Linux 6.10-rc1
On Fri, 14 Jun 2024 at 09:21, Linus Torvalds wrote: > > Let's bring in the actual gpu people.. Dave/Jani/others - does any of > this sound familiar? Pavel says things have gotten much slower in > 6.10: "something was very wrong with the performance, likely to do > with graphics" Actually, maybe it's not graphics at all. Rafael just sent me a pull request that fixes a "turbo is disabled at boot, but magically enabled at runtime by firmware" issue. The 6.10-rc1 kernel would notice that turbo was disabled, and stopped noticing that it magically got re-enabled. Pavel, that was with a very different laptop, but who knows... That would match the "laptop is much slower" thing. So current -git might be worth checking. Linus
Re: Linux 6.10-rc1
On Wed, 19 Jun 2024 at 03:44, Pavel Machek wrote: > > Ok, so machine is ready to be thrown out of window, again. Trying to > play 29C3 video should not make machine completely unusable ... as in > keyboard looses keystrokes in terminal. Well, that at least sounds like you can bisect it with a very clear test-case? Even if you don't bisect all the way, just doing a handful of bisections tends to narrow things down enough that we can at least guess at what general kind of area it is... Linus
[PATCH] epoll: try to be a _bit_ better about file lifetimes
epoll is a mess, and does various invalid things in the name of performance. Let's try to rein it in a bit. Something like this, perhaps? Not-yet-signed-off-by: Linus Torvalds --- This is entirely untested, thus the "Not-yet-signed-off-by". But I think this may be kind of the right path forward. I suspect the ->poll() call is the main case that matters, but there are other places where eventpoll just looks up the file pointer without then being very careful about it. The sock_from_file(epi->ffd.file) uses in particular should probably also use this to look up the file. Comments? fs/eventpoll.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 882b89edc52a..bffa8083ff36 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -285,6 +285,30 @@ static inline void free_ephead(struct epitems_head *head) kmem_cache_free(ephead_cache, head); } +/* + * The ffd.file pointer may be in the process of + * being torn down due to being closed, but we + * may not have finished eventpoll_release() yet. + * + * Technically, even with the atomic_long_inc_not_zero, + * the file may have been free'd and then gotten + * re-allocated to something else (since files are + * not RCU-delayed, they are SLAB_TYPESAFE_BY_RCU). + * + * But for epoll, we don't much care. + */ +static struct file *epi_fget(const struct epitem *epi) +{ + struct file *file; + + rcu_read_lock(); + file = epi->ffd.file; + if (!atomic_long_inc_not_zero(&file->f_count)) + file = NULL; + rcu_read_unlock(); + return file; +} + static void list_file(struct file *file) { struct epitems_head *head; @@ -987,14 +1011,18 @@ static __poll_t __ep_eventpoll_poll(struct file *file, poll_table *wait, int dep static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt, int depth) { - struct file *file = epi->ffd.file; + struct file *file = epi_fget(epi); __poll_t res; + if (!file) + return 0; + pt->_key = epi->event.events; if (!is_file_epoll(file)) res = vfs_poll(file, pt); else res = __ep_eventpoll_poll(file, pt, depth); + fput(file); return res & epi->event.events; } -- 2.44.0.330.g4d18c88175
Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)
On Fri, 3 May 2024 at 14:11, Al Viro wrote: > > What we need is > * promise that ep_item_poll() won't happen after > eventpoll_release_file(). > AFAICS, we do have that. > * ->poll() not playing silly buggers. No. That is not enough at all. Because even with perfectly normal "->poll()", and even with the ep_item_poll() happening *before* eventpoll_release_file(), you have this trivial race: ep_item_poll() ->poll() and *between* those two operations, another CPU does "close()", and that causes eventpoll_release_file() to be called, and now f_count goes down to zero while ->poll() is running. So you do need to increment the file count around the ->poll() call, I feel. Or, alternatively, you'd need to serialize with eventpoll_release_file(), but that would need to be some sleeping lock held over the ->poll() call. > As it is, dma_buf ->poll() is very suspicious regardless of that > mess - it can grab reference to file for unspecified interval. I think that's actually much preferable to what epoll does, which is to keep using files without having reference counts to them (and then relying on magically not racing with eventpoll_release_file(). Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, 3 May 2024 at 14:24, Al Viro wrote: > > Can we get to ep_item_poll(epi, ...) after eventpoll_release_file() > got past __ep_remove()? Because if we can, we have a worse problem - > epi freed under us. Look at the hack in __ep_remove(): if it is concurrent with eventpoll_release_file(), it will hit this code spin_lock(&file->f_lock); if (epi->dying && !force) { spin_unlock(&file->f_lock); return false; } and not free the epi. But as far as I can tell, almost nothing else cares about the f_lock and dying logic. And in fact, I don't think doing spin_lock(&file->f_lock); is even valid in the places that look up file through "epi->ffd.file", because the lock itself is inside the thing that you can't trust until you've taken the lock... So I agree with Kees about the use of "atomic_dec_not_zero()" kind of logic - but it also needs to be in an RCU-readlocked region, I think. I wish epoll() just took the damn file ref itself. But since it relies on the file refcount to release the data structure, that obviously can't work. Linus
Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)
On Fri, 3 May 2024 at 14:36, Al Viro wrote: > > ... the last part is no-go - poll_wait() must be able to grab a reference > (well, the callback in it must) Yeah. I really think that *poll* itself is doing everything right. It knows that it's called with a file pointer with a reference, and it adds its own references as needed. And I think that's all fine - both for dmabuf in particular, but for poll in general. That's how things are *supposed* to work. You can keep references to other things in your 'struct file *', knowing that files are properly refcounted, and won't go away while you are dealing with them. The problem, of course, is that then epoll violates that "called with reference" part. epoll very much by design does *not* take references to the files it keeps track of, and then tears them down at close() time. Now, epoll has its reasons for doing that. They are even good reasons. But that does mean that when epoll needs to deal with that hackery. I wish we could remove epoll entirely, but that isn't an option, so we need to just make sure that when it accesses the ffd.file pointer, it does so more carefully. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, 3 May 2024 at 14:45, Al Viro wrote: > > How do you get through eventpoll_release_file() while someone > has entered ep_item_poll()? Doesn't matter. Look, it's enough that the file count has gone down to zero. You may not even have gotten to eventpoll_release_file() yet - the important part is that you're on your *way* to it. That means that the file will be released - and it means that you have violated all the refcounting rules for poll(). So I think you're barking up the wrong tree. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, 3 May 2024 at 15:07, Al Viro wrote: > > Suppose your program calls select() on a pipe and dmabuf, sees data to be read > from pipe, reads it, closes both pipe and dmabuf and exits. > > Would you expect that dmabuf file would stick around for hell knows how long > after that? I would certainly be very surprised by running into that... Why? That's the _point_ of refcounts. They make the thing they refcount stay around until it's no longer referenced. Now, I agree that dmabuf's are a bit odd in how they use a 'struct file' *as* their refcount, but hey, it's a specialty use. Unusual perhaps, but not exactly wrong. I suspect that if you saw a dmabuf just have its own 'refcount_t' and stay around until it was done, you wouldn't bat an eye at it, and it's really just the "it uses a struct file for counting" that you are reacting to. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, 3 May 2024 at 16:23, Kees Cook wrote: > > static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) > { > return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; > } > > If we end up adding epi_fget(), we'll have 2 cases of using > "atomic_long_inc_not_zero" for f_count. Do we need some kind of blessed > helper to live in file.h or something, with appropriate comments? I wonder if we could try to abstract this out a bit more. These games with non-ref-counted file structures *feel* a bit like the games we play with non-ref-counted (aka "stashed") 'struct dentry' that got fairly recently cleaned up with path_from_stashed() when both nsfs and pidfs started doing the same thing. I'm not loving the TTM use of this thing, but at least the locking and logic feels a lot more straightforward (ie the atomic_long_inc_not_zero() here is clealy under the 'prime->mutex' lock IOW, the tty use looks correct to me, and it has fairly simple locking and is just catching the the race between 'fput()' decrementing the refcount and and 'file->f_op->release()' doing the actual release. You are right that it's similar to the epoll thing in that sense, it just looks a _lot_ more straightforward to me (and, unlike epoll, doesn't look actively buggy right now). Could we abstract out this kind of "stashed file pointer" so that we'd have a *common* form for this? Not just the inc_not_zero part, but the locking rule too? Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, 3 May 2024 at 16:39, Al Viro wrote: > > *IF* those files are on purely internal filesystem, that's probably > OK; do that with something on something mountable (char device, > sysfs file, etc.) and you have a problem with filesystem staying > busy. Yeah, I agree, it's a bit annoying in general. That said, it's easy to do: stash a file descriptor in a unix domain socket, and that's basically exactly what you have: a random reference to a 'struct file' that will stay around for as long as you just keep that socket around, long after the "real" file descriptor has been closed, and entirely separately from it. And yes, that's exactly why unix domain socket transfers have caused so many problems over the years, with both refcount overflows and nasty garbage collection issues. So randomly taking references to file descriptors certainly isn't new. In fact, it's so common that I find the epoll pattern annoying, in that it does something special and *not* taking a ref - and it does that special thing to *other* ("innocent") file descriptors. Yes, dma-buf is a bit like those unix domain sockets in that it can keep random references alive for random times, but at least it does it just to its own file descriptors, not random other targets. So the dmabuf thing is very much a "I'm a special file that describes a dma buffer", and shouldn't really affect anything outside of active dmabuf uses (which admittedly is a large portion of the GPU drivers, and has been expanding from there...). I So the reason I'm annoyed at epoll in this case is that I think epoll triggered the bug in some entirely innocent subsystem. dma-buf is doing something differently odd, yes, but at least it's odd in a "I'm a specialized thing" sense, not in some "I screw over others" sense. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 02:37, Christian Brauner wrote: > > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, > poll_table *poll) > if (!dmabuf || !dmabuf->resv) > return EPOLLERR; > > + if (!get_file_active(&dmabuf->file)) > + return EPOLLERR; [...] I *really* don't think anything that touches dma-buf.c can possibly be right. This is not a dma-buf.c bug. This is purely an epoll bug. Lookie here, the fundamental issue is that epoll can call '->poll()' on a file descriptor that is being closed concurrently. That means that *ANY* driver that relies on *any* data structure that is managed by the lifetime of the 'struct file' will have problems. Look, here's sock_poll(): static __poll_t sock_poll(struct file *file, poll_table *wait) { struct socket *sock = file->private_data; and that first line looks about as innocent as it possibly can, right? Now, imagine that this is called from 'epoll' concurrently with the file being closed for the last time (but it just hasn't _quite_ reached eventpoll_release() yet). Now, imagine that the kernel is built with preemption, and the epoll thread gets preempted _just_ before it loads 'file->private_data'. Furthermore, the machine is under heavy load, and it just stays off its CPU a long time. Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the file closing completes, eventpoll_release() finishes, and the preemption of the poll() thing just takes so long that you go through an RCU period too, so that the actual file has been released too. So now that totally innoced file->private_data load in the poll() is probably going to get random data. Yes, the file is allocated as SLAB_TYPESAFE_BY_RCU, so it's probably still a file. Not guaranteed, even the slab will get fully free'd in some situations. And yes, the above case is impossible to hit in practice. You have to hit quite the small race window with an operation that practically never happens in the first place. But my point is that the fact that the problem with file->f_count lifetimes happens for that dmabuf driver is not the fault of the dmabuf code. Not at all. It is *ENTIRELY* a bug in epoll, and the dmabuf code is probably just easier to hit because it has a poll() function that does things that have longer lifetimes than most things, and interacts more directly with that f_count. So I really don't understand why Al thinks this is "dmabuf does bad things with f_count". It damn well does not. dma-buf is the GOOD GUY here. It's doing things *PROPERLY*. It's taking refcounts like it damn well should. The fact that it takes ref-counts on something that the epoll code has messed up is *NOT* its fault. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:32, Linus Torvalds wrote: > > Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the > file closing completes, eventpoll_release() finishes [..] Actually, Al is right that ep_item_poll() should be holding the ep->mtx, so eventpoll_release() -> eventpoll_release_file_file() -> mutex_lock(&ep->mtx) should block and the file doesn't actually get released. So I guess the sock_poll() issue cannot happen. It does need some poll() function that does 'fget()', and believes that it works. But because the f_count has already gone down to zero, fget() doesn't work, and doesn't keep the file around, and you have the bug. The cases that do fget() in poll() are probably race, but they aren't buggy. epoll is buggy. So my example wasn't going to work, but the argument isn't really any different, it's just a much more limited case that breaks. And maybe it's even *only* dma-buf that does that fget() in its ->poll() function. Even *then* it's not a dma-buf.c bug. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:40, Linus Torvalds wrote: > > And maybe it's even *only* dma-buf that does that fget() in its > ->poll() function. Even *then* it's not a dma-buf.c bug. They all do in the sense that they do poll_wait -> __pollwait -> get_file (*boom*) but the boom is very small because the poll_wait() will be undone by poll_freewait(), and normally poll/select has held the file count elevated. Except for epoll. Which leaves those pollwait entries around until it's done - but again will be held up on the ep->mtx before it does so. So everybody does some f_count games, but possibly dma-buf is the only one that ends up expecting to hold on to the f_count for longer periods. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:32, Linus Torvalds wrote: > > Lookie here, the fundamental issue is that epoll can call '->poll()' > on a file descriptor that is being closed concurrently. Thinking some more about this, and replying to myself... Actually, I wonder if we could *really* fix this by simply moving the eventpoll_release() to where it really belongs. If we did it in file_close_fd_locked(), it would actually make a *lot* more sense. Particularly since eventpoll actually uses this: struct epoll_filefd { struct file *file; int fd; } __packed; ie it doesn't just use the 'struct file *', it uses the 'fd' itself (for ep_find()). (Strictly speaking, it should also have a pointer to the 'struct files_struct' to make the 'int fd' be meaningful). IOW, eventpoll already considers the file _descriptor_ relevant, not just the file pointer, and that's destroyed at *close* time, not at 'fput()' time. Yeah, yeah, the locking situation in file_close_fd_locked() is a bit inconvenient, but if we can solve that, it would solve the problem in a fundamentally different way: remove the ep iterm before the file->f_count has actually been decremented, so the whole "race with fput()" would just go away entirely. I dunno. I think that would be the right thing to do, but I wouldn't be surprised if some disgusting eventpoll user then might depend on the current situation where the eventpoll thing stays around even after the close() if you have another copy of the file open. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sun, 5 May 2024 at 03:50, Christian Brauner wrote: > > And I agree with you that for some instances it's valid to take another > reference to a file from f_op->poll() but then they need to use > get_file_active() imho and simply handle the case where f_count is zero. I think this is (a) practically impossible to find (since most f_count updates are in various random helpers) (b) not tenable in the first place, since *EVERYBODY* does a f_count update as part of the bog-standard pollwait So (b) means that the notion of "warn if somebody increments f_count from zero" is broken to begin with - but it's doubly broken because it wouldn't find anything *anyway*, since this never happens in any normal situation. And (a) means that any non-automatic finding of this is practically impossible. > And we need to document that in Documentation/filesystems/file.rst or > locking.rst. WHY? Why cannot you and Al just admit that the problem is in epoll. Always has been, always will be. The fact is, it's not dma-buf that is violating any rules. It's epoll. It's calling out to random driver functions with a file pointer that is no longer valid. It really is that simple. I don't see why you are arguing for "unknown number of drivers - we know at least *one* - have to be fixed for a bug that is in epoll". If it was *easy* to fix, and if it was *easy* to validate, then sure. But that just isn't the case. In contrast, in epoll it's *trivial* to fix the one case where it does a VFS call-out, and just say "you have to follow the rules". So explain to me again why you want to mess up the driver interface and everybody who has a '.poll()' function, and not just fix the ONE clearly buggy piece of code. Because dammit,. epoll is clearly buggy. It's not enough to say "the file allocation isn't going away", and claim that that means that it's not buggy - when the file IS NO LONGER VALID! Linus
[PATCH v2] epoll: be better about file lifetimes
epoll can call out to vfs_poll() with a file pointer that may race with the last 'fput()'. That would make f_count go down to zero, and while the ep->mtx locking means that the resulting file pointer tear-down will be blocked until the poll returns, it means that f_count is already dead, and any use of it won't actually get a reference to the file any more: it's dead regardless. Make sure we have a valid ref on the file pointer before we call down to vfs_poll() from the epoll routines. Link: https://lore.kernel.org/lkml/2d631f0615918...@google.com/ Reported-by: syzbot+045b454ab35fd82a3...@syzkaller.appspotmail.com Reviewed-by: Jens Axboe Signed-off-by: Linus Torvalds --- Changes since v1: - add Link, Reported-by, and Jens' reviewed-by. And sign off on it because it looks fine to me and we have some testing now. - move epi_fget() closer to the user - more comments about the background - remove the rcu_read_lock(), with the comment explaining why it's not needed - note about returning zero rather than something like EPOLLERR|POLLHUP for a file that is going away fs/eventpoll.c | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 882b89edc52a..a3f0f868adc4 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -979,6 +979,37 @@ static __poll_t __ep_eventpoll_poll(struct file *file, poll_table *wait, int dep return res; } +/* + * The ffd.file pointer may be in the process of + * being torn down due to being closed, but we + * may not have finished eventpoll_release() yet. + * + * Normally, even with the atomic_long_inc_not_zero, + * the file may have been free'd and then gotten + * re-allocated to something else (since files are + * not RCU-delayed, they are SLAB_TYPESAFE_BY_RCU). + * + * But for epoll, users hold the ep->mtx mutex, and + * as such any file in the process of being free'd + * will block in eventpoll_release_file() and thus + * the underlying file allocation will not be free'd, + * and the file re-use cannot happen. + * + * For the same reason we can avoid a rcu_read_lock() + * around the operation - 'ffd.file' cannot go away + * even if the refcount has reached zero (but we must + * still not call out to ->poll() functions etc). + */ +static struct file *epi_fget(const struct epitem *epi) +{ + struct file *file; + + file = epi->ffd.file; + if (!atomic_long_inc_not_zero(&file->f_count)) + file = NULL; + return file; +} + /* * Differs from ep_eventpoll_poll() in that internal callers already have * the ep->mtx so we need to start from depth=1, such that mutex_lock_nested() @@ -987,14 +1018,23 @@ static __poll_t __ep_eventpoll_poll(struct file *file, poll_table *wait, int dep static __poll_t ep_item_poll(const struct epitem *epi, poll_table *pt, int depth) { - struct file *file = epi->ffd.file; + struct file *file = epi_fget(epi); __poll_t res; + /* +* We could return EPOLLERR | EPOLLHUP or something, +* but let's treat this more as "file doesn't exist, +* poll didn't happen". +*/ + if (!file) + return 0; + pt->_key = epi->event.events; if (!is_file_epoll(file)) res = vfs_poll(file, pt); else res = __ep_eventpoll_poll(file, pt, depth); + fput(file); return res & epi->event.events; } -- 2.44.0.330.g4d18c88175
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sun, 5 May 2024 at 12:46, Al Viro wrote: > > I've no problem with having epoll grab a reference, but if we make that > a universal requirement ->poll() instances can rely upon, Al, we're note "making that a requirement". It always has been. Otgherwise, the docs should have shouted out DAMN LOUDLY that you can't rely on all the normal refcounting of 'struct file' THAT EVERY SINGLE NORMAL VFS FUNCTION CAN. Lookie herte: epoll is unimportant and irrelevant garbage compared to something fundamental like "read()", and what does read() do? It does this: struct fd f = fdget_pos(fd); if (f.file) { ... which is being DAMN CAREFUL to make sure that the file has the proper refcounts before it then calls "vfs_read()". There's a lot of very careful and subtle code in fdget_pos() to make this all proper, and that even if the file is closed by another thread concurrently, we *always* have a refcount to it, and it's always live over the whole 'vfs_read()' sequence. 'vfs_poll()' is NOT DIFFERENT in this regard. Not at all. Now, you have two choices that are intellectually honest: - admit that epoll() - which is a hell of a lot less important - should spend a small fraction of that effort on making its vfs_poll() use sane - say that all this fdget_pos() care is uncalled for in the read() path, and we should make all the filesystem .read() functions be aware that the file pointer they get may be garbage, and they should use get_file_active() to make sure every 'struct file *' use they have is safe? because if your choice is that "epoll can do whatever the f*&k it wants", then it's in clear violation of all the effort we go to in a MUCH MORE IMPORTANT code path, and is clearly not consistent or logical. Neither you nor Christian have explained why you think it's ok for that epoll() garbage to magically violate all our regular rules. Your claim that those regular rules are some new conditional requirement that we'd be imposing. NO. They are the rules that *anybody* who gets a 'struct file *' pointer should always be able to rely on by default: it's damn well a ref-counted thing, and the caller holds the refcount. The exceptional case is exactly the other way around: if you do random crap with unrefcounted poitners, it's *your* problem, and *you* are the one who has to be careful. Not some unrelated poor driver that didn't know about your f*&k-up. Dammit, epoll is CLEARLY BUGGY. It's passing off random kernel pointers without holding a refcount to them. THAT'S A BUG. And fixing that bug is *not* somehow changing existing rules as you are trying to claim. No. It's just fixing a bug. So stop claiming that this is some "new requirement". It is absolutely nothing of the sort. epoll() actively MISUSED file pointer, because file pointers are fundamentally refcounted (as are pretty much all sane kernel interfaces). Linus
Re: [PATCH v2] epoll: be better about file lifetimes
On Sun, 5 May 2024 at 13:02, David Laight wrote: > > How much is the extra pair of atomics going to hurt performance? > IIRC a lot of work was done to (try to) make epoll lockless. If this makes people walk away from epoll, that would be absolutely *lovely*. Maybe they'd start using io_uring instead, which has had its problems, but is a lot more capable in the end. Yes, doing things right is likely more expensive than doing things wrong. Bugs are cheap. That doesn't make buggy code better. Epoll really isn't important enough to screw over the VFS subsystem over. I did point out elsewhere how this could be fixed by epoll() removing the ep items at a different point: https://lore.kernel.org/all/CAHk-=wj6xl9mgcd_nuzrj6saken0tsyttzdfpgdw34r+zmz...@mail.gmail.com/ so if somebody actually wants to fix up epoll properly, that would probably be great. In fact, that model would allow epoll() to just keep a proper refcount as an fd is added to the poll events, and would probably fix a lot of nastiness. Right now those ep items stay around for basically random amounts of time. But maybe there are other ways to fix it. I don't think we have an actual eventpoll maintainer any more, but what I'm *not* willing to happen is eventpoll messing up other parts of the kernel. It was always a ugly performance hack, and was only acceptable as such. "ugly hack" is ok. "buggy ugly hack" is not. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sun, 5 May 2024 at 13:30, Al Viro wrote: > > 0. special-cased ->f_count rule for ->poll() is a wart and it's > better to get rid of it. > > 1. fs/eventpoll.c is a steaming pile of shit and I'd be glad to see > git rm taken to it. Short of that, by all means, let's grab reference > in there around the call of vfs_poll() (see (0)). Agreed on 0/1. > 2. having ->poll() instances grab extra references to file passed > to them is not something that should be encouraged; there's a plenty > of potential problems, and "caller has it pinned, so we are fine with > grabbing extra refs" is nowhere near enough to eliminate those. So it's not clear why you hate it so much, since those extra references are totally normal in all the other VFS paths. I mean, they are perhaps not the *common* case, but we have a lot of random get_file() calls sprinkled around in various places when you end up passing a file descriptor off to some asynchronous operation thing. Yeah, I think most of them tend to be special operations (eg the tty TIOCCONS ioctl to redirect the console), but it's not like vfs_ioctl() is *that* different from vfs_poll. Different operation, not somehow "one is more special than the other". cachefiles and backing-file does it for regular IO, and drop it at IO completion - not that different from what dma-buf does. It's in ->read_iter() rather than ->poll(), but again: different operations, but not "one of them is somehow fundamentally different". > 3. dma-buf uses of get_file() are probably safe (epoll shite aside), > but they do look fishy. That has nothing to do with epoll. Now, what dma-buf basically seems to do is to avoid ref-counting its own fundamental data structure, and replaces that by refcounting the 'struct file' that *points* to it instead. And it is a bit odd, but it actually makes some amount of sense, because then what it passes around is that file pointer (and it allows passing it around from user space *as* that file). And honestly, if you look at why it then needs to add its refcount to it all, it actually makes sense. dma-bufs have this notion of "fences" that are basically completion points for the asynchronous DMA. Doing a "poll()" operation will add a note to the fence to get that wakeup when it's done. And yes, logically it takes a ref to the "struct dma_buf", but because of how the lifetime of the dma_buf is associated with the lifetime of the 'struct file', that then turns into taking a ref on the file. Unusual? Yes. But not illogical. Not obviously broken. Tying the lifetime of the dma_buf to the lifetime of a file that is passed along makes _sense_ for that use. I'm sure dma-bufs could add another level of refcounting on the 'struct dma_buf' itself, and not make it be 1:1 with the file, but it's not clear to me what the advantage would really be, or why it would be wrong to re-use a refcount that is already there. Linus
Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)
On Mon, 6 May 2024 at 10:46, Stefan Metzmacher wrote: > > I think it's a very important detail that epoll does not take > real references. Otherwise an application level 'close()' on a socket > would not trigger a tcp disconnect, when an fd is still registered with > epoll. Yes, exactly. epoll() ends up actually wanting the lifetime of the ep item be the lifetime of the file _descriptor_, not the lifetime of the file itself. We approximate that - badly - with epoll not taking a reference on the file pointer, and then at final fput() it tears things down. But that has two real issues, and one of them is that "oh, now epoll has file pointers that are actually dead" that caused this thread. The other issue is that "approximates" thing, where it means that duplicating the file pointer (dup*() and fork() end unix socket file sending etc) will not mean that the epoll ref is also out of sync with the lifetime of the file descriptor. That's why I suggested that "clean up epoll references at file_close_fd() time instead: https://lore.kernel.org/all/CAHk-=wj6xl9mgcd_nuzrj6saken0tsyttzdfpgdw34r+zmz...@mail.gmail.com/ because it would actually really *fix* the lifetime issue of ep items. In the process, it would make it possible to actually take a f_count reference at EPOLL_CTL_ADD time, since now the lifetime of the EP wouldn't be tied to the lifetime of the 'struct file *' pointer, it would be properly tied to the lifetime of the actual file descriptor that you are adding. So it would be a huge conceptual cleanup too. (Of course - at that point EPOLL_CTL_ADD still doesn't actually _need_ a reference to the file, since the file being open in itself is already that reference - but the point here being that there would *be* a reference that the epoll code would effectively have, and you'd never be in the situation we were in where there would be stale "dead" file pointers that just haven't gone through the cleanup yet). But I'd rather not touch the epoll code more than I have to. Which is why I applied the minimal patch for just "refcount over vfs_poll()", and am just mentioning my suggestion in the hope that some eager beaver would like to see how painful it would do to make the bigger surgery... Linus
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, 7 May 2024 at 04:03, Daniel Vetter wrote: > > It's really annoying that on some distros/builds we don't have that, and > for gpu driver stack reasons we _really_ need to know whether a fd is the > same as another, due to some messy uniqueness requirements on buffer > objects various drivers have. It's sad that such a simple thing would require two other horrid models (EPOLL or KCMP). There'[s a reason that KCMP is a config option - *some* of that is horrible code - but the "compare file descriptors for equality" is not that reason. Note that KCMP really is a broken mess. It's also a potential security hole, even for the simple things, because of how it ends up comparing kernel pointers (ie it doesn't just say "same file descriptor", it gives an ordering of them, so you can use KCMP to sort things in kernel space). And yes, it orders them after obfuscating the pointer, but it's still not something I would consider sane as a baseline interface. It was designed for checkpoint-restore, it's the wrong thing to use for some "are these file descriptors the same". The same argument goes for using EPOLL for that. Disgusting hack. Just what are the requirements for the GPU stack? Is one of the file descriptors "trusted", IOW, you know what kind it is? Because dammit, it's *so* easy to do. You could just add a core DRM ioctl for it. Literally just struct fd f1 = fdget(fd1); struct fd f2 = fdget(fd2); int same; same = f1.file && f1.file == f2.file; fdput(fd1); fdput(fd2); return same; where the only question is if you also woudl want to deal with O_PATH fd's, in which case the "fdget()" would be "fdget_raw()". Honestly, adding some DRM ioctl for this sounds hacky, but it sounds less hacky than relying on EPOLL or KCMP. I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl too, if this is possibly a more common thing. and not just DRM wants it. Would something like that work for you? Linus
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, 7 May 2024 at 11:04, Daniel Vetter wrote: > > On Tue, May 07, 2024 at 09:46:31AM -0700, Linus Torvalds wrote: > > > I'd be perfectly ok with adding a generic "FISAME" VFS level ioctl > > too, if this is possibly a more common thing. and not just DRM wants > > it. > > > > Would something like that work for you? > > Yes. > > Adding Simon and Pekka as two of the usual suspects for this kind of > stuff. Also example code (the int return value is just so that callers know > when kcmp isn't available, they all only care about equality): > > https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/util/os_file.c#L239 That example thing shows that we shouldn't make it a FISAME ioctl - we should make it a fcntl() instead, and it would just be a companion to F_DUPFD. Doesn't that strike everybody as a *much* cleaner interface? I think F_ISDUP would work very naturally indeed with F_DUPFD. Yes? No? Linus
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Tue, 7 May 2024 at 12:07, Linus Torvalds wrote: > > That example thing shows that we shouldn't make it a FISAME ioctl - we > should make it a fcntl() instead, and it would just be a companion to > F_DUPFD. > > Doesn't that strike everybody as a *much* cleaner interface? I think > F_ISDUP would work very naturally indeed with F_DUPFD. So since we already have two versions of F_DUPFD (the other being F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend on that existing naming pattern, and called it F_DUPFD_QUERY instead. I'm not married to the name, so if somebody hates it, feel free to argue otherwise. But with that, the suggested patch would end up looking something like the attached (I also re-ordered the existing "F_LINUX_SPECIFIC_BASE" users, since one of them was out of numerical order). This really feels like a very natural thing, and yes, the 'same_fd()' function in systemd that Christian also pointed at could use this very naturally. Also note that I obviously haven't tested this. Because obviously this is trivially correct and cannot possibly have any bugs. Right? RIGHT? And yes, I did check - despite the odd jump in numbers, we've never had anything between F_NOTIFY (+2) and F_CANCELLK (+5). We added F_SETLEASE (+0) , F_GETLEASE (+1) and F_NOTIFY (+2) in 2.4.0-test9 (roughly October 2000, I didn't dig deeper). And then back in 2007 we suddenly jumped to F_CANCELLK (+5) in commit 9b9d2ab4154a ("locks: add lock cancel command"). I don't know why 3/4 were shunned. After that we had 22d2b35b200f ("F_DUPFD_CLOEXEC implementation") add F_DUPFD_CLOEXEC (+6). I'd have loved to put F_DUPFD_QUERY next to it, but +5 and +7 are both used. Linus fs/fcntl.c | 23 +++ include/uapi/linux/fcntl.h | 14 -- 2 files changed, 31 insertions(+), 6 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 54cc85d3338e..1ddb63f70445 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -327,6 +327,25 @@ static long fcntl_set_rw_hint(struct file *file, unsigned int cmd, return 0; } +/* + * Is the file descriptor a dup of the file? + */ +static long f_dupfd_query(int fd, struct file *filp) +{ + struct fd f = fdget_raw(fd); + + /* + * We can do the 'fdput()' immediately, as the only thing that + * matters is the pointer value which isn't changed by the fdput. + * + * Technically we didn't need a ref at all, and 'fdget()' was + * overkill, but given our lockless file pointer lookup, the + * alternatives are complicated. + */ + fdput(f); + return f.file == filp; +} + static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, struct file *filp) { @@ -342,6 +361,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, case F_DUPFD_CLOEXEC: err = f_dupfd(argi, filp, O_CLOEXEC); break; + case F_DUPFD_QUERY: + err = f_dupfd_query(argi, filp); + break; case F_GETFD: err = get_close_on_exec(fd) ? FD_CLOEXEC : 0; break; @@ -446,6 +468,7 @@ static int check_fcntl_cmd(unsigned cmd) switch (cmd) { case F_DUPFD: case F_DUPFD_CLOEXEC: + case F_DUPFD_QUERY: case F_GETFD: case F_SETFD: case F_GETFL: diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 282e90aeb163..c0bcc185fa48 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -8,6 +8,14 @@ #define F_SETLEASE (F_LINUX_SPECIFIC_BASE + 0) #define F_GETLEASE (F_LINUX_SPECIFIC_BASE + 1) +/* + * Request nofications on a directory. + * See below for events that may be notified. + */ +#define F_NOTIFY (F_LINUX_SPECIFIC_BASE + 2) + +#define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3) + /* * Cancel a blocking posix lock; internal use only until we expose an * asynchronous lock api to userspace: @@ -17,12 +25,6 @@ /* Create a file descriptor with FD_CLOEXEC set. */ #define F_DUPFD_CLOEXEC (F_LINUX_SPECIFIC_BASE + 6) -/* - * Request nofications on a directory. - * See below for events that may be notified. - */ -#define F_NOTIFY (F_LINUX_SPECIFIC_BASE+2) - /* * Set and get of pipe page size array */
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Wed, 8 May 2024 at 09:19, Linus Torvalds wrote: > > So since we already have two versions of F_DUPFD (the other being > F_DUPFD_CLOEXEC) I decided that the best thing to do is to just extend > on that existing naming pattern, and called it F_DUPFD_QUERY instead. > > I'm not married to the name, so if somebody hates it, feel free to > argue otherwise. Side note: with this patch, doing ret = fcntl(fd1, F_DUPFD_QUERY, fd2); will result in: -1 (EBADF): 'fd1' is not a valid file descriptor -1 (EINVAL): old kernel that doesn't support F_DUPFD_QUERY 0: fd2 does not refer to the same file as fd1 1: fd2 is the same 'struct file' as fd1 and it might be worth noting a couple of things here: (a) fd2 being an invalid file descriptor does not cause EBADF, it just causes "does not match". (b) we *could* use more bits for more equality IOW, it would possibly make sense to extend the 0/1 result to be - bit #0: same file pointer - bit #1: same path - bit #2: same dentry - bit #3: same inode which are all different levels of "sameness". Does anybody care? Do we want to extend on this "sameness"? I'm not convinced, but it might be a good idea to document this as a possibly future extension, ie *if* what you care about is "same file pointer", maybe you should make sure to only look at bit #0. Linus
Re: [Linaro-mm-sig] Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Thu, 9 May 2024 at 04:39, Christian Brauner wrote: > > Not worth it without someone explaining in detail why imho. First pass > should be to try and replace kcmp() in scenarios where it's obviously > not needed or overkill. Ack. > I've added a CLASS(fd_raw) in a preliminary patch since we'll need that > anyway which means that your comparison patch becomes even simpler imho. > I've also added a selftest patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.misc LGTM. Maybe worth adding an explicit test for "open same file, but two separate opens, F_DUPFD_QUERY returns 0? Just to clarify the "it's not testing the file on the filesystem for equality, but the file pointer itself". Linus
Re: [git pull] drm for 6.10-rc1
On Tue, 14 May 2024 at 23:21, Dave Airlie wrote: > > In drivers the main thing is a new driver for ARM Mali firmware based > GPUs, otherwise there are a lot of changes to amdgpu/xe/i915/msm and > scattered changes to everything else. Hmm. There's something seriously wrong with amdgpu. I'm getting a ton of__force_merge warnings: WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199 __force_merge+0x14f/0x180 [drm_buddy] Modules linked in: hid_logitech_hidpp hid_logitech_dj uas usb_storage amdgpu drm_ttm_helper ttm video drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel drm ghash_clmulni_intel igb atlantic nvme dca macsec ccp i2c_algo_bit nvme_core sp5100_tco wmi ip6_tables ip_tables fuse CPU: 0 PID: 1069 Comm: plymouthd Not tainted 6.9.0-07381-g3860ca371740 #60 Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40 AORUS MASTER, BIOS F7 09/07/2022 RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy] Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00 00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b 4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02 RSP: 0018:b87a81cb7908 EFLAGS: 00010246 RAX: 9b1915de8000 RBX: 9b1919478288 RCX: 0800 RDX: 9b19194782f8 RSI: 9b19194782d0 RDI: 9b19194782b0 RBP: R08: 9b1919478288 R09: 1000 R10: 0800 R11: R12: 9b192590fa18 R13: 000d R14: 1000 R15: FS: 7fa06bfa9740() GS:9b281e00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 555adb857000 CR3: 00011b516000 CR4: 00350ef0 Call Trace: ? __force_merge+0x14f/0x180 [drm_buddy] drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy] ? __cond_resched+0x16/0x40 amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu] ttm_resource_alloc+0x31/0x120 [ttm] ttm_bo_alloc_resource+0xbc/0x260 [ttm] ttm_bo_validate+0x9f/0x210 [ttm] ttm_bo_init_reserved+0x103/0x130 [ttm] amdgpu_bo_create+0x246/0x400 [amdgpu] ? amdgpu_bo_destroy+0x70/0x70 [amdgpu] amdgpu_bo_create_user+0x29/0x40 [amdgpu] amdgpu_mode_dumb_create+0x108/0x190 [amdgpu] ? amdgpu_bo_destroy+0x70/0x70 [amdgpu] ? drm_mode_create_dumb+0xa0/0xa0 [drm] drm_ioctl_kernel+0xad/0xd0 [drm] drm_ioctl+0x330/0x4b0 [drm] ? drm_mode_create_dumb+0xa0/0xa0 [drm] amdgpu_drm_ioctl+0x41/0x80 [amdgpu] __x64_sys_ioctl+0xd2a/0xe00 ? update_process_times+0x89/0xa0 ? tick_nohz_handler+0xe2/0x120 ? timerqueue_add+0x94/0xa0 ? __hrtimer_run_queues+0x12b/0x250 ? ktime_get+0x34/0xb0 ? lapic_next_event+0x12/0x20 ? clockevents_program_event+0x78/0xd0 ? hrtimer_interrupt+0x118/0x390 ? sched_clock+0x5/0x10 do_syscall_64+0x68/0x130 ? __irq_exit_rcu+0x53/0xb0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 and eventually the whole thing just crashes entirely, with a bad page state in the VM: BUG: Bad page state in process kworker/u261:13 pfn:31fb9a page: refcount:0 mapcount:0 mapping:ff0b239e index:0x37ce8 pfn:0x31fb9a aops:btree_aops ino:1 flags: 0x2fffc60020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x3fff) page_type: 0x() which comes from a btrfs worker (btrfs-delayed-meta btrfs_work_helper), but I would not be surprised if that was caused by whatever odd thing is going on with the DRM code. IOW, it *looks* like this code ends up just corrupting memory in horrible ways. Linus Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 13:06, Linus Torvalds wrote: > > Hmm. There's something seriously wrong with amdgpu. > > I'm getting a ton of__force_merge warnings: > > WARNING: CPU: 0 PID: 1069 at drivers/gpu/drm/drm_buddy.c:199 > __force_merge+0x14f/0x180 [drm_buddy] Adding likely culprits to the participants, since it looks like this is all new with commit 96950929eb23 ("drm/buddy: Implement tracking clear page feature"). Sadly I can't juist revert that commit to check, because there are many subsequent commits that then depend on it. I guess I'll try to revert the later commit that enables it for amdgpu (commit a68c7eaa7a8f) and see if it at least makes the horrendous messages go away. Anyway, this is some old Radeon graphics card in my Threadripper: 49:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller]) Subsystem: Sapphire Technology Limited Radeon RX 570 Pulse 4GB Flags: bus master, fast devsel, latency 0, IRQ 130, IOMMU group 32 Memory at c000 (64-bit, prefetchable) [size=256M] Memory at d000 (64-bit, prefetchable) [size=2M] I/O ports at 8000 [size=256] Memory at d1c0 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c [disabled] [size=128K] Capabilities: Kernel driver in use: amdgpu Kernel modules: amdgpu I think it's a "Sapphire Radeon Pulse RX 580" or something like that. Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 13:21, Linus Torvalds wrote: > > I guess I'll try to revert the later commit that enables it for amdgpu > (commit a68c7eaa7a8f) and see if it at least makes the horrendous > messages go away. I have to revert both a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") to make things build cleanly. Next step: see if it boots and fixes the problem for me. Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 13:24, Linus Torvalds wrote: > > I have to revert both > > a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality") > e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour") > > to make things build cleanly. Next step: see if it boots and fixes the > problem for me. Well, perhaps not surprisingly, the WARN_ON() no longer triggers with this, and everything looks fine. Let's see if the machine ends up being stable now. It took several hours for the "scary messages" state to turn into the "hung machine" state, so they *could* have been independent issues, but it seems a bit unlikely. Linus
Re: [git pull] drm for 6.10-rc1
On Tue, 14 May 2024 at 23:21, Dave Airlie wrote: > > This is the main pull request for the drm subsystems for 6.10. .. and now that I look more at this pull request, I find other things wrong. Why is the DRM code asking if I want to enable -Werror? I have Werror enabled *already*. I hate stupid config questions. They only confuse users. If the global WERROR config is enabled, then the DRM config certainly shouldn't ask whether you want even more -Werror. It does nothing but annoy people. And no, we are not going to have subsystems that can *weaken* the existing CONFIG_WERROR. Happily, that doesn't seem to be what the DRM code wants to do, it just wants to add -Werror, but as mentioned, its' crazy to do that when we already have it globally enabled. Now, it might make more sense to ask if you want -Wextra. A lot of those warnings are bogus. Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 15:45, Dave Airlie wrote: > > The drm subsystem enables more warnings than the kernel default, so > this config option is disabled by default. Irrelevant. If the *main* CONFIG_WERROR is on, then it does NOT MATTER if somebody sets CONFIG_DRM_WERROR or not. It's a no-op. It's pointless. And that means that it's also entirely pointless to ask. It's only annoying. > depends on DRM && EXPERT > > so we aren't throwing it at random users. Yes you are. Because - rightly or wrongly - distros enable EXPERT by default. At least Fedora does. So any user that starts from a distro config will have EXPERT enabled. > should we rename it CONFIG_DRM_WERROR_MORE or something? Renaming does nothing. If it's pointless, it's pointless even if it's renamed. It needs to have a depends on !WERROR because if WERROR is already true, then it's stupid and wrong to ask AGAIN. To summarize: if the main WERROR is enabled, then the DRM tree is *ALREADY* built with WERROR. Asking for DRM_WERROR is wrong. I keep harping on bad config variables because our kernel config thing is already much too messy and is by far the most difficult part of building your own kernel. Everything else is literally just "make" followed by "make modules_install" and "make install". Very straightforward. But doing a kernel config? Nasty. And made nastier by bad and nonsensical questions. Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 16:17, Dave Airlie wrote: > > It's also possible it's just that hey there's a few others in the tree > > KVM_WERROR not tied to it > PPC_WERROR (why does CXL uses this?) Yeah, that should be fixed too, but at least KVM_WERROR predates the whole-kernel WERROR. And PPC_WERROR predates it by over a decade. But yes, good catch - both of those should be silenced if we already have the global WERROR enabled. I mainly notice new questions (because I use "make oldconfig"), so old pre-existing illogical ones don't trigger my "why are they asking?" reaction. > AMDGPU, I915 and XE all have !COMPILE_TEST on their variants Hmm. It turns out that I didn't notice the AMDGPU one because my Threadripper - that has AMDGPU enabled - I have actually turned off EXPERT on, so it's hidden by that for me. But yes, both of those should be "depends on !WERROR" too. Or maybe they should just go away entirely, and be subsumed by the DRM_WERROR thing. Linus
Re: [git pull] drm for 6.10-rc1
On Wed, 15 May 2024 at 16:51, Dave Airlie wrote: > > > Let's see if the machine ends up being stable now. It took several > > hours for the "scary messages" state to turn into the "hung machine" > > state, so they *could* have been independent issues, but it seems a > > bit unlikely. > > This worries me actually, it's possible this warn could cause a > problem, but I'm not convinced it should have machine ending > properties without some sort of different error at the end, so I'd > keep an eye open here. Well, since I'm a big believer in dogfooding, I always run my own kernel even during the merge window. I don't reboot between each pull, but I try to basically reboot daily. And it's entirely possible that the eventual "bad page flags" error - which is what I think triggered the eventual hang - is something else that came in during this merge window. I haven't actually gotten the -mm changes from Andrew yet, but it did happen in the btrfs kworker, and I have merged the btrfs changes for 6.10. So maybe they are the cause. I was blaming the DRM case mainly because it clearly *was* about some kind of allocation management, and I got a *lot* of those warnings: $ journalctl -b -1 | grep 'WARNING: CPU' | wc -1 16015 but let's see if it happens with my amdgpu reverts in place, and no drm warnings. It most definitely wouldn't be the first time we had multiple independent bugs during the merge window ;/ Linus
Re: [git pull] drm urgent for 6.10-rc1
On Wed, 15 May 2024 at 19:54, Dave Airlie wrote: > > Here is the buddy allocator fix I picked up from the list, please apply. So I removed my reverts, and am running a kernel that includes the merge 972a2543e3dd ("Merge tag 'drm-next-2024-05-16' of https://gitlab.freedesktop.org/drm/kernel";) but I still see a lot of warnings as per below. I was going to say that the difference is that now they trigger through the page fault path (amdgpu_gem_fault) while previously they triggered through the system call path and amdgpu_drm_ioctl. But it turns out it's both in both cases, and it just happened to be one or the other in the particular warnings that I cut-and-pasted. As before, there are tens of thousands of them after being up for less than an hour, so this is not some kind of rare thing. The machine hasn't _crashed_ yet, though. But I'm going to be out and about and working on my laptop the rest of the day, so I won't be able to test. (And that kernel version of "6.9.0-08295-gfd39ab3b5289" that is quoted in the WARN isn't some official kernel, I have about ten private patches that I keep testing in my tree, so if you wondered what the heck that git version is, it's not going to match anything you see, but the ~ten patches also aren't relevant to this). Nothing unusual in the config, although this is clang-built. Shouldn't matter, never has before. Linus --- CPU: 28 PID: 3326 Comm: mutter-x11-fram Tainted: GW 6.9.0-08295-gfd39ab3b5289 #64 Hardware name: Gigabyte Technology Co., Ltd. TRX40 AORUS MASTER/TRX40 AORUS MASTER, BIOS F7 09/07/2022 RIP: 0010:__force_merge+0x14f/0x180 [drm_buddy] Code: 74 0d 49 8b 44 24 18 48 d3 e0 49 29 44 24 30 4c 89 e7 ba 01 00 00 00 e8 9f 00 00 00 44 39 e8 73 1f 49 8b 04 24 e9 25 ff ff ff <0f> 0b 4c 39 c3 75 a3 eb 99 b8 f4 ff ff ff c3 b8 f4 ff ff ff eb 02 RSP: :9e350314baa0 EFLAGS: 00010246 RAX: 974a227a4a00 RBX: 974a2d024b88 RCX: 0b8eb800 RDX: 974a2d024bf8 RSI: 974a2d024bd0 RDI: 974a2d024bb0 RBP: R08: 974a2d024b88 R09: 1000 R10: 0800 R11: R12: 974a2198fa18 R13: 0009 R14: 1000 R15: FS: 7f56a78b6540() GS:97591e70() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f568804 CR3: 000198cc9000 CR4: 00350ef0 Call Trace: ? __warn+0xc1/0x190 ? __force_merge+0x14f/0x180 [drm_buddy] ? report_bug+0x129/0x1a0 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? __force_merge+0x14f/0x180 [drm_buddy] drm_buddy_alloc_blocks+0x249/0x400 [drm_buddy] ? __cond_resched+0x16/0x40 amdgpu_vram_mgr_new+0x204/0x3f0 [amdgpu] ttm_resource_alloc+0x31/0x120 [ttm] ttm_bo_alloc_resource+0xbc/0x260 [ttm] ? memcg_account_kmem+0x4a/0xe0 ? ttm_resource_compatible+0xbb/0xe0 [ttm] ttm_bo_validate+0x9f/0x210 [ttm] ? __alloc_pages+0x129/0x210 amdgpu_bo_fault_reserve_notify+0x98/0x110 [amdgpu] amdgpu_gem_fault+0x53/0xd0 [amdgpu] __do_fault+0x41/0x140 do_pte_missing+0x453/0xfd0 handle_mm_fault+0x73c/0x1090 do_user_addr_fault+0x2e2/0x6f0 exc_page_fault+0x56/0x110 asm_exc_page_fault+0x22/0x30
Re: [git pull] drm urgent for 6.10-rc1
On Thu, 16 May 2024 at 18:08, Dave Airlie wrote: > > Linus, do you see it a boot straight away? Ok, back at that computer now, and yes, I see those messages right away. In fact, they seem to happen before gnome even starts up, ie I see those messages long before the first messages from gnome-session: May 17 12:07:17 tr3970x kernel: WARNING: CPU: 4 PID: 1067 at drivers/gpu/drm/drm_buddy.c:198 __force_merge+0x184/0x1b0 [drm_buddy] .. lots and lots and lots of them .. ... May 17 12:07:23 tr3970x systemd-cryptsetup[982]: ... ... May 17 12:07:25 tr3970x systemd[1]: Reached target basic.target ... May 17 12:07:25 tr3970x systemd[1]: Mounted sysroot.mount - /sysroot. ... May 17 12:07:25 tr3970x systemd[1]: Switching root. ... May 17 12:07:36 tr3970x gnome-session[2824]: .. ... May 17 12:07:36 tr3970x gnome-shell[2836]: Obtained a high priority EGL context May 17 12:07:36 tr3970x kernel: WARNING: CPU: 31 PID: 2836 at drivers/gpu/drm/drm_buddy.c:198 __force_merge+0x184/0x1b0 [drm_buddy] .. lots of warnings resume ... IOW, it happens already during the graphical boot before I have even typed in my disk encryption password. Then it starts again when gnome starts. I just checked: I have exactly 8192 warnings from the early boot before the first gnome warning. Which sounds like too round a number to be an accident. I will try the patch Alex pointed at next: https://patchwork.freedesktop.org/patch/594539/ and see if that fixes it for me. Linus
Re: [git pull] drm urgent for 6.10-rc1
On Fri, 17 May 2024 at 06:55, Alex Deucher wrote: > > Can you try this patch? > https://patchwork.freedesktop.org/patch/594539/ Ack. This seems to fix it for me - unless the problem is random and only happens sometimes, and I've just been *very* unlucky until now. Linus
Re: [PATCH 0/2] docs & checkpatch: allow Closes tags with links
On Thu, Mar 16, 2023 at 4:43 AM Matthieu Baerts wrote: > > @Linus: in short, we would like to continue using the "Closes:" tag (or > similar, see below) with a URL in commit messages. They are useful to > have public bug trackers doing automated actions like closing a specific > ticket. Any objection from your side? As long as it's a public link, I guess that just documents what the drm people have been doing. I'm not convinced "Closes" is actually any better than just "Link:", though. I would very much hope and expect that the actual closing of any bug report is actually done separately and verified, rather than some kind of automated "well, the commit says it closes it, so.." So honestly, I feel like "Link:" is just a better thing, and I worry that "Closes:" is then going to be used for random internal crap. We've very much seen people wanting to do that - having their own private bug trackers, and then using the commit message to refer to them, which I am *violently* against. If it's only useful to some closed community, it shouldn't be in the public commits. And while the current GPU people seem to use "Closes:" the right way (and maybe some other groups do too - but it does seem to be mostly a freedesktop thing), I really think it is amenable to mis-use in ways "Link:" is not. The point of "Link:" is explicitly two-fold: - it makes it quite obvious that you expect an actual valid web-link, not some internal garbage - random people always want random extensions, and "Link:" is _designed_ to counter-act that creeping "let's add a random new tag" disease. It's very explicitly "any relevant link". and I really question the value of adding new types of tags, particularly ones that seem almost designed to be mis-used. So I'm not violently against it, and 99% of the existing uses seem fine. But I do note that some of the early "Closes:" tags in the kernel were very much complete garbage, and exactly the kind of thing that I absolutely detest. What does Closes: 10437 mean? That's crazy talk. (And yes, in that case it was a kernel.bugzilla.org number, which is perfectly fine, but I'm using it as a very real example of how "Closes:" ends up being very naturally to mis-use). End result: I don't hate our current "Closes:" uses. But I'm very wary of it. I'm not at all convinced that it really adds a lot of value over "Link:", and I am, _very_ aware of how easily it can be then taken to be a "let's use our own bug tracker cookies here". So I will neither endorse nor condemn it, but if I see people using it wrong, I will absolutely put my foot down. Linus
Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
On Wed, Mar 15, 2023 at 5:22 PM Steven Rostedt wrote: > > I hope that this gets in by -rc3, as I want to start basing my next branch > on that tag. My tree should have it now as commit c00133a9e87e ("drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs"). Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:05 AM Nathan Chancellor wrote: > > On the clang front, I am still seeing the following warning turned error > for arm64 allmodconfig at least: > > drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is > uninitialized when used here [-Werror,-Wuninitialized] > if (syncpt_irq < 0) > ^~ Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised that gcc doesn't warn about this. That syncpt_irq thing isn't written to anywhere, so that's pretty egregious. We use -Wno-maybe-uninitialized because gcc gets it so wrong, but that's different from the "-Wuninitialized" thing (without the "maybe"). I've seen gcc mess this up when there is one single assignment, because then the SSA format makes it *so* easy to just use that assignment out-of-order (or unconditionally), but this case looks unusually clear-cut. So the fact that gcc doesn't warn about it is outright odd. > If that does not come to you through other means before -rc4, could you > just apply it directly so that I can stop applying it to our CI? :) Bah. I took it now, there's no excuse for that thing. Do we have any gcc people around that could explain why gcc failed so miserably at this trivial case? Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:26 AM Linus Torvalds wrote: > > Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised > that gcc doesn't warn about this. Side note: I'm also wondering why that TEGRA_HOST1X config has that ARM dependency in depends on ARCH_TEGRA || (ARM && COMPILE_TEST) because it seems to build just fine at least on x86-64 if I change it to be just depends on ARCH_TEGRA || COMPILE_TEST ie there seems to be nothing ARM-specific in there. Limiting it to just the tegra platform by default makes 100% sense, but that "only do compile-testing on ARM" seems a bit bogus. That limit goes back to forever (commit 6f44c2b5280f: "gpu: host1x: Increase compile test coverage" back in Nov 2013), so maybe things didn't use to work as well back in the dark ages? None of this explains why gcc didn't catch it, but at least allowing the build on x86-64 would likely have made it easier for people to see clang catching this. Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:56 AM Nathan Chancellor wrote: > > I did see a patch fly by to fix that: > > https://lore.kernel.org/20230316082035.567520-3-christian.koe...@amd.com/ > > It seems like the DRM_TEGRA half of it is broken though: > > https://lore.kernel.org/202303170635.a2rsq1wu-...@intel.com/ Hmm. x86-64 has 'vmap()' too. So I think that DRM_TEGRA breakage is likely just due to a missing header file include that then (by luck and mistake) gets included on arm. You need for 'vmap()'. There might be something else going on, I didn't look deeply at it. Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 1:05 PM Guenter Roeck wrote: > > I have noticed that gcc doesn't always warn about uninitialized variables > in most architectures. Yeah, I'm getting the feeling that when the gcc people were trying to make -Wmaybe-uninitialized work better (when moving it into "-Wall"), they ended up moving a lot of "clearly uninitialized" cases into it. So then because we disable the "maybe" case (with -Wno-maybe-uninitialized) because it had too many random false positives, we end up not seeing the obvious cases either. Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 3:06 PM Nathan Chancellor wrote: > > Right, this seems like a subtle difference in semantics between > -Wuninitialized between clang and GCC. I guess it's a bit ambiguous whether it's "X may be USED uninitialized" or whether it is "X may BE uninitialized" and then depending on how you see that ambiguity, the control flow matters. In this case, there is absolutely no question that the variable is uninitialized (since there is no write to it at all). So it is very clearly and unambiguously uninitialized. And I do think that as a result, "-Wuninitialized" should warn. But at the same time, whether it is *used* or not depends on that conditional, so I can see how it could be confusing and not be so clear an unambiguous. On the whole, I do wish that the logic would be "after dead code removal, if some pseudo has no initializer, it should always warn, regardless of any remaining dynamic conditoinals". That "after dead code removal" might matter, because I could see where config things (#ifdef's etc) would just remove the initialization of some variable, and if the use is behind some static "if (0)", then warning about it is all kinds of silly. Linus
Re: Linux 6.3-rc3
On Wed, Mar 22, 2023 at 9:40 AM Sedat Dilek wrote: > > You have to pass `make LLVM=1` in any case... to `oldconfig` or when > adding any MAKEFLAGS like -j${number-of-available-cpus}. I actually think we should look (again) at just making the compiler choice (and the prefix) be a Kconfig option. That would simplify *so* many use cases. It used to be that gcc was "THE compiler" and anything else was just an odd toy special case, but that's clearly not true any more. So it would be lovely to make the kernel choice a Kconfig choice - so you'd set it only at config time, and then after that a kernel build wouldn't need special flags any more, and you'd never need to play games with GNUmakefile or anything like that. Yes, you'd still use environment variables (or make arguments) for that initial Kconfig, but that's no different from the other environment variables we already have, like KCONFIG_SEED that kconfig uses internally, but also things like "$(ARCH)" that we already use *inside* the Kconfig files themselves. I really dislike how you have to set ARCH and CROSS_COMPILE etc externally, and can't just have them *in* the config file. So when you do cross-compiles, right now you have to do something like make ARCH=i386 allmodconfig to build the .config file, but then you have to *repeat* that ARCH=i386 when you actually build things: make ARCH=i386 because the ARCH choice ends up being in the .config file, but the makefiles themselves always take it from the environment. There are good historical reasons for our behavior (and probably a number of extant practical reasons too), but it's a bit annoying, and it would be lovely if we could start moving away from this model. Linus
Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation
On Wed, Sep 28, 2022 at 1:15 AM Gwan-gyeong Mun wrote: > > + if (check_assign(obj->base.size >> PAGE_SHIFT, &npages)) > + return -E2BIG; I have to say, I find that new "check_assign()" macro use to be disgusting. It's one thing to check for overflows. It's another thing entirely to just assign something to a local variable. This disgusting "let's check and assign" needs to die. It makes the code a completely unreadable mess. The "user" wersion is even worse. If you worry about overflow, then use a mix of (a) use a sufficiently large type to begin with (b) check for value range separately and in this particular case, I also suspect that the whole range check should have been somewhere else entirely - at the original creation of that "obj" structure, not at one random end-point where it is used. In other words, THIS WHOLE PATCH is just end-points checking the size requirements of that "base.size" thing much too late, when it should have been checked originally for some "maximum acceptable base size" instead. And that "maximum acceptable base size" should *not* be about "this is the size of the variables we use". It should be a sanity check of "this value is sane and fits in sane use cases". Because "let's plug security checks" is most definitely not about picking random assignments and saying "let's check this one". It's about trying to catch things earlier than that. Kees, you need to reign in the craziness in overflow.h. Linus
Re: [git pull] drm for 6.1-rc1
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie wrote: > > This is very conflict heavy, mostly the correct answer is picking > the version from drm-next. Ugh, yes, that was a bit annoying. I get the same end result as you did, but I do wonder if the drm people should try to keep some kind of separate "fixes" branches for things that go both into the development tree and then get sent to me for fixes pulls? Hopefully this "lots of pointless noise" was a one-off, but it might be due to how you guys work.. Linus
Re: [git pull] drm for 6.1-rc1
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie wrote: > > Lots of stuff all over, some new AMD IP support and gang > submit support [..] Hmm. I have now had my main desktop lock up twice after pulling this. Nothing in the dmesg after a reboot, and nothing in particular that seems to trigger it, so I have a hard time even guessing what's up, but the drm changes are the primary suspect. I will try to see if I can get any information out of the machine, but with the symptom being just a dead machine ... This is the same (old) Radeon device: 49:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) with dual 4k monitors, running on my good old Threadripper setup. Again, there's no explicit reason to blame the drm pull, except that it started after that merge (that machine ran the kernel with the networking pull for a day with no problems, and while there were other pull requests in between them, they seem to be fairly unrelated to the hardware I have). But the lockup is so sporadic (twice in the last day) that I really can't bisect it, so I'm afraid I have very very little info. Any suggestions? Linus
Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")
On Thu, Oct 6, 2022 at 1:51 AM Sudip Mukherjee (Codethink) wrote: > > This is only seen with gcc-11, gcc-12 builds are ok. Hmm. This seems to be some odd gcc issue. I *think* that what is going on is that the test j = 0 ; j < MAX_DWB_PIPES makes gcc decide that "hey, j is in the range [0,MAX_DWB_PIPES[", and then since MAX_DWB_PIPES is 1, that simplifies to "j must be zero". Good range analysis so far. But for 'i', we start off with that lower bound of 0, but the upper bound is not fixed (the test for "i" is: "i < stream->num_wb_info"). And then that "if (i != j)", so now gcc decides that it can simplify that to "if (i != 0)", and then simplifies *that* to "oh, the lower bound of 'i' in that code is actually 1. So then it decides that "stream->writeback_info[i]" must be out of bounds. Of course, the *reality* is that stream->num_wb_info should be <= MAX_DWB_PIPES, and as such (with the current MAX_DWB_PIPES value of 1) it's not that 'i' can be 1, it's that the code in question cannot be reached at all. What confuses me is that error message ("array subscript [0, 0] is outside array bounds of 'struct dc_writeback_info[1]') which seems to be aware that the value is actually 0. So this seems to be some gcc-11 range analysis bug, but I don't know what the fix is. I suspect some random code change would magically just make gcc realize it's ok after all, but since it all depends on random gcc confusion, I don't know what the random code change would be. The fix *MAY* be to just add a '&& i < MAX_DWB_PIPES' to that loop too, and then gcc will see that both i and j are always 0, and that the code is unreachable and not warn about it. Hmm? Can you test that? And the reason gcc-12 builds are ok probably isn't that gcc-12 gets this right, it's simply that gcc-12 gets so many *opther* things wrong that we already disabled -Warray-bounds with gcc-12 entirely. If somebody cannot come up with a fix, I suspect the solution is "gcc array bounds analysis is terminally buggy" and we just need to disable it for gcc-11 too. Kees, any idea? Who else might be interested in fixing a -Warray-bounds issue? Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 12:30 PM Dave Airlie wrote: > > netconsole? I've never been really successful with that in the past, and haven't used it for decades. I guess I could try if nothing else works. Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 12:28 PM Alex Deucher wrote: > > Maybe you are seeing this which is an issue with GPU TLB flushes which > is kind of sporadic: > https://gitlab.freedesktop.org/drm/amd/-/issues/2113 Well, that seems to be 5.19, and while timing changes (or whatever other software updates) could have made it start trigger, this machine has been pretty solid otgerwise. > Are you seeing any GPU page faults in your kernel log? Nothing even remotely like that "no-retry page fault" in that issue report. Of course, if it happens just before the whole thing locks up... Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie wrote: > > > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0088 > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched] As far as I can tell, that's the line struct drm_gpu_scheduler *sched = s_fence->sched; where 's_fence' is NULL. The code is 0: 0f 1f 44 00 00nopl 0x0(%rax,%rax,1) 5: 41 54push %r12 7: 55push %rbp 8: 53push %rbx 9: 48 89 fb mov%rdi,%rbx c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction 13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp) 1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax and that next 'lock decl' instruction would have been the atomic_dec(&sched->hw_rq_count); at the top of drm_sched_job_done(). Now, as to *why* you'd have a NULL s_fence, it would seem that drm_sched_job_cleanup() was called with an active job. Looking at that code, it does if (kref_read(&job->s_fence->finished.refcount)) { /* drm_sched_job_arm() has been called */ dma_fence_put(&job->s_fence->finished); ... but then it does job->s_fence = NULL; anyway, despite the job still being active. The logic of that kind of "fake refcount" escapes me. The above looks fundamentally racy, not to say pointless and wrong (a refcount is a _count_, not a flag, so there could be multiple references to it, what says that you can just decrement one of them and say "I'm done"). Now, _why_ any of that happens, I have no idea. I'm just looking at the immediate "that pointer is NULL" thing, and reacting to what looks like a completely bogus refcount pattern. But that odd refcount pattern isn't new, so it's presumably some user on the amd gpu side that changed. The problem hasn't happened again for me, but that's not saying a lot, since it was very random to begin with. Linus
Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")
On Thu, Oct 6, 2022 at 1:50 PM Sudip Mukherjee wrote: > > > And it looks like Sudip's proposed fix for this particular code is > > additionally fixing unsigned vs signed as well. I think -Warray-bounds > > did its job (though, with quite a confusing index range in the report). > > Not my. Linus's. I just tested. :) I suspect Kees meant Stephen's other patch that Hamza pointed at, and that is perhaps the cleaner version. That said, I hate how this forces us to write random code changes just to make a compiler just randomly _happen_ to not complain about it. Linus
Re: [git pull] drm fixes for 6.1-rc1
On Thu, Oct 13, 2022 at 5:29 PM Dave Airlie wrote: > > Round of fixes for the merge window stuff, bunch of amdgpu and i915 > changes, this should have the gcc11 warning fix, amongst other > changes. Some of those amd changes aren't "fixes". They are some major code changes. We're still in the merge window, so I'm letting it slide, but calling then "fixes" really stretches things. They are fixes exactly the same way completely new development can "fix" things. Linus
Re: [PATCH] drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n
On Fri, Oct 14, 2022 at 8:22 AM Nathan Chancellor wrote: > > After commit 8799c0be89eb ("drm/amd/display: Fix vblank refcount in vrr > transition"), a build with CONFIG_DEBUG_FS=n is broken due to a > misplaced brace, along the lines of: Thanks, applied. Linus
Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)
[ Back from travel, so trying to make sense of this series.. ] On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park wrote: > > I've been developing a tool for detecting deadlock possibilities by > tracking wait/event rather than lock(?) acquisition order to try to > cover all synchonization machanisms. It's done on v6.2-rc2. Ugh. I hate how this adds random patterns like if (timeout == MAX_SCHEDULE_TIMEOUT) sdt_might_sleep_strong(NULL); else sdt_might_sleep_strong_timeout(NULL); ... sdt_might_sleep_finish(); to various places, it seems so very odd and unmaintainable. I also recall this giving a fair amount of false positives, are they all fixed? Anyway, I'd really like the lockdep people to comment and be involved. We did have a fairly recent case of "lockdep doesn't track page lock dependencies because it fundamentally cannot" issue, so DEPT might fix those kinds of missing dependency analysis. See https://lore.kernel.org/lkml/60d41f05f139a...@google.com/ for some context to that one, but at teh same time I would *really* want the lockdep people more involved and acking this work. Maybe I missed the email where you reported on things DEPT has found (and on the lack of false positives)? Linus
Re: [pull] amdgpu drm-fixes-6.4
On Fri, 23 Jun 2023 at 14:18, Alex Deucher wrote: > > are available in the Git repository at: > > https://gitlab.freedesktop.org/agd5f/linux.git > tags/amd-drm-fixes-6.4-2023-06-23 That's not a valid signed tag. Yes, it's a tag. But it doesn't actually have any cryptographic signing, and I'm not willing to pull random content from git sites that I can't verify. In fact, these days I ask even kernel.org pull requests to be proper signed tags, although I haven't really gotten to the point where I *require* it. So please sign your tags - use "git tag -s" (or "-u keyname" if you have some specific key you want to use, rather than one described by your regular git config file). Linus
Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers
On Thu, Nov 3, 2022 at 10:48 PM Steven Rostedt wrote: > > Ideally, I would have the first patch go into this rc cycle, which is mostly > non functional as it will allow the other patches to come in via the > respective > subsystems in the next merge window. Ack. I also wonder if we could do the completely trivially correct conversions immediately. I'm talking about the scripted ones where it's currently a "del_timer_sync()", and the very next action is freeing whatever data structure the timer is in (possibly with something like free_irq() in between - my point is that there's an unconditional free that is very clear and unambiguous), so that there is absolutely no question about whether they should use "timer_shutdown_sync()" or not. IOW, things like patches 03, 17 and 31, and at least parts others in this series. This series clearly has several much more complex cases that need actual real code review, and I think it would help to have the completely unambiguous cases out of the way, just to get rid of noise. So I'd take that first patch, and a scripted set of "this cannot change any semantics" patches early. Linus
Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers
On Fri, Nov 4, 2022 at 12:42 PM Steven Rostedt wrote: > > Linus, should I also add any patches that has already been acked by the > respective maintainer? No, I'd prefer to keep only the ones that are 100% unambiguously not changing any semantics. Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Fri, Nov 4, 2022 at 11:01 PM Steven Rostedt wrote: > > Patch 1 fixes an issue with sunrpc/xprt where it incorrectly uses > del_singleshot_timer_sync() for something that is not a oneshot timer. As this > will be converted to shutdown, this needs to be fixed first. So this is the kind of thing that I would *not* want to get eartly. I really would want to get just the infrastructure in to let people start doing conversions. And then the "mindlessly obvious patches that are done by scripting and can not possibly matter". The kinds that do not *need* review, because they are mechanical, and that just cause pointless noise for the rest of the patches that *do* want review. Not this kind of thing that is so subtle that you have to explain it. That's not a "scripted patch for no semantic change". So leave the del_singleshot_timer_sync() cases alone, they are irrelevant for the new infrastructure and for the "mindless scripted conversion" patches. > Patches 2-4 changes existing timer_shutdown() functions used locally in ARM > and > some drivers to better namespace names. Ok, these are relevant. > Patch 5 implements the new timer_shutdown() and timer_shutdown_sync() > functions > that disable re-arming the timer after they are called. This is obviously what I'd want early so that people can start doign this in their trees. > Patches 6-28 change all the locations where there's a kfree(), kfree_rcu(), > kmem_cache_free() and one call_rcu() call where the RCU function frees the > timer (the workqueue patch) in the same function as the del_timer{,_sync}() is > called on that timer, and there's no extra exit path between the del_timer and > freeing of the timer. So honestly, I was literally hoping for a "this is the coccinelle script" kind of patch. Now there seems to be a number of patches here that are actualyl really hard to see that they are "obviously correct" and I can't tell if they are actually scripted or not. They don't *look* scripted, but I can't really tell. I looked at the patches with ten lines of context, and I didn't see the immediately following kfree() even in that expanded patch context, so it's fairly far away. Others in the series were *definitely* not scripted, doing clearly manual cleanups: -if (dch->timer.function) { -del_timer(&dch->timer); -dch->timer.function = NULL; -} +timer_shutdown(&dch->timer); so no, this does *not* make me feel "ok, this is all trivial". IOW, I'd really want *just* the infrastructure and *just* the provably trivial stuff. If it wasn't some scripted really obvious thing that cannot possibly change anything and that wasn't then edited manually for some reason, I really don't want it early. IOW, any early conversions I'd take are literally about removing pure mindless noise. Not about doing conversions. And I wouldn't mind it as a single conversion patch that has the coccinelle script as the explanation. Really just THAT kind of "100% mindless conversion". Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Sat, Nov 5, 2022 at 11:04 AM Steven Rostedt wrote: > > Here's the changes I made after running the script Please. No. What part of "I don't want extra crud" was I unclear on? I'm not interested in converting everything. That's clearly a 6.,2 issue, possibly even longer considering how complicated the networking side has been. I'm not AT ALL interested in "oh, I then added my own small cleanups on top to random files because I happened to notice them". Repeat after me: "If the script didn't catch them, they weren't trivially obvious". And it does seem that right now the script itself is a bit too generous, which is why it didn't notice that sometimes there wasn't a kfree after all because of a goto around it. So clearly that "..." doesn't really work, I think it accepts "_any_ path leads to the second situation" rather than "_all_ paths lead to the second situation". But yeah, my coccinelle-foo is very weak too, and maybe there's no pattern for "no flow control". I would also like the coccinelle script to notice the "timer is used afterwards", so that it does *not* modify that case that does del_timer(&dch->timer); dch->timer.function = NULL; since now the timer is modified in between the del_timer() and the kfree. Again, that timer modification is then made pointless by changing the del_timer() to a "timer_shutdown()", but at that point it is no longer a "so obvious non-semantic change that it should be scripted". At that point it's a manual thing. So I think the "..." in your script should be "no flow control, and no access to the timer", but do not know how to do that in coccinelle. Julia? And this thread has way too many participants, I suspect some email systems will just mark it as spam as a result. Which is partly *why* I would like to get rid of noisy changes that really don't matter - but I would like it to be truly mindlessly obvious that there are *zero* questions about it, and absolutely no manual intervention because the patch is so strict that it's just unquestionably correct. Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Sat, Nov 5, 2022 at 2:03 PM Jason A. Donenfeld wrote: > > Something that might help here is changing the `...` into > `... when exists` or into `... when != ptr` or similar. I actually tried that. You don't want "when exists", you'd want "when forall", but that seems to be the default. And trying "when != ptr->timer" actually does the right thing in that it gets rid of the case where the timer is modified outside of the del_timer() case, *but* it also causes odd other changes to the output. Look at what it generates for that drivers/media/usb/pvrusb2/pvrusb2-hdw.c file, which finds a lot of triggers with the "when != ptr->timer", but only does one without it. So I gave up, just because I clearly don't understand the rules. (Comparing output is also fun because the ordering of the patches is random, so consecutive runs with the same rule will give different patches. I assume that it's just because it's done in parallel, but it doesn't help the "try to see what changes when you change the script" ;) Linus
Re: [PATCH RFC 00/19] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)
On Mon, Nov 7, 2022 at 8:18 AM David Hildenbrand wrote: > > So instead, make R/O long-term pinning work as expected, by breaking COW > in a COW mapping early, such that we can remove any FOLL_FORCE usage from > drivers. Nothing makes me unhappy from a quick scan through these patches. And I'd really love to just have this long saga ended, and FOLL_FORCE finally relegated to purely ptrace accesses. So an enthusiastic Ack from me. Linus
Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE
On Wed, Nov 16, 2022 at 2:30 AM David Hildenbrand wrote: > > Let's make it clearer that functionality provided by FOLL_FORCE is > really only for ptrace access. I'm not super-happy about this one. I do understand the "let's rename the bit so that no new user shows up". And it's true that the main traditional use is ptrace. But from the patch itself it becomes obvious that no, it's not *just* ptrace. At least not yet. It's used for get_arg_page(), which uses it to basically look up (and install) pages in the newly created VM. Now, I'm not entirely sure why it even uses FOLL_FORCE, - I think it might be historical, because the target should always be the new stack vma. Following the history of it is a big of a mess, because there's a number of renamings and re-organizations, but it seems to go back to 2007 and commit b6a2fea39318 ("mm: variable length argument support"). Before that commit, we kept our own array of "this is the set of pages that I will install in the new VM". That commit basically just inserts the pages directly into the VM instead, getting rid of the array size limitation. So at a minimum, I think that FOLL_FORCE would need to be removed before any renaming to FOLL_PTRACE, because that's not some kind of small random case. It *might* be as simple as just removing it, but maybe there's some reason for having it that I don't immediately see. There _are_ also small random cases too, like get_cmdline(). Maybe that counts as ptrace, but the execve() case most definitely does not. Linus
Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE
On Thu, Nov 17, 2022 at 2:58 PM Kees Cook wrote: > > Oh, er, why does get_arg_page() even need FOLL_FORCE? This is writing the > new stack contents to the nascent brpm->vma, which was newly allocated > with VM_STACK_FLAGS, which an arch can override, but they all appear to > include > VM_WRITE | VM_MAYWRITE. Yeah, it does seem entirely superfluous. It's been there since the very beginning (although in that original commit b6a2fea39318 it was there as a '1' to the 'force' argument to get_user_pages()). I *think* it can be just removed. But as long as it exists, it should most definitely not be renamed to FOLL_PTRACE. There's a slight worry that it currently hides some other setup issue that makes it matter, since it's been that way so long, but I can't see what it is. Linus
Re: [PATCH RFC 16/19] mm/frame-vector: remove FOLL_FORCE usage
On Tue, Nov 22, 2022 at 4:25 AM Hans Verkuil wrote: > > I tracked the use of 'force' all the way back to the first git commit > (2.6.12-rc1) in the very old video-buf.c. So it is very, very old and the > reason is lost in the mists of time. Well, not entirely. For archaeology reasons, I went back to the old BK history, which exists as a git conversion in https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/ and there you can actually find it. Not with a lot of explanations, though - it's commit b7649ef789 ("[PATCH] videobuf update"): This updates the video-buf.c module (helper module for video buffer management). Some memory management fixes, also some adaptions to the final v4l2 api. but it went from err = get_user_pages(current,current->mm, -data, dma->nr_pages, -rw == READ, 0, /* don't force */ +data & PAGE_MASK, dma->nr_pages, +rw == READ, 1, /* force */ dma->pages, NULL); in that commit. So it goes back to October 2002. > Looking at this old LWN article https://lwn.net/Articles/28548/ suggests > that it might be related to calling get_user_pages for write buffers The timing roughly matches. > I assume that removing FOLL_FORCE from 'FOLL_FORCE|FOLL_WRITE' will still > allow drivers to read from the buffer? The issue with some of the driver hacks has been that - they only want to read, and the buffer may be read-only - they then used FOLL_WRITE despite that, because they want to break COW (due to the issue that David is now fixing with his series) - but that means that the VM layer says "nope, you can't write to this read-only user mapping" - ... and then they use FOLL_FORCE to say "yes, I can". iOW, the FOLL_FORCE may be entirely due to an (incorrect, but historically needed) FOLL_WRITE. Linus
Re: drm/vkms: deadlock between dev->event_lock and timer
On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa wrote: > > Hello. A deadlock was reported in drivers/gpu/drm/vkms/ . > It looks like this locking pattern remains as of 6.6-rc1. Please fix. > > void drm_crtc_vblank_off(struct drm_crtc *crtc) { > spin_lock_irq(&dev->event_lock); > drm_vblank_disable_and_save(dev, pipe) { > __disable_vblank(dev, pipe) { > crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank { > hrtimer_cancel(&out->vblank_hrtimer) { // Retries with > dev->event_lock held until lock_hrtimer_base() succeeds. > ret = hrtimer_try_to_cancel(timer) { > base = lock_hrtimer_base(timer, &flags); // Fails forever because > vkms_vblank_simulate() is in progress. Heh. Ok. This is clearly a bug, but it does seem to be limited to just the vkms driver, and literally only to the "simulate vblank" case. The worst part about it is that it's so subtle and not obvious. Some light grepping seems to show that amdgpu has almost the exact same pattern in its own vkms thing, except it uses hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer); directly, which presumably fixes the livelock, but means that the cancel will fail if it's currently running. So just doing the same thing in the vkms driver probably fixes things. Maybe the vkms people need to add a flag to say "it's canceled" so that it doesn't then get re-enabled? Or maybe it doesn't matter and/or already happens for some reason I didn't look into. Linus
github version complaints about the gitlab CI requirements.txt
So every time I push to my github mirror, github now ends up having a 'dependabot' thing that warns about some of the CI version requirements for the gitlab automated testing file. It wants to update the pip requirements from 23.2.1 to 23.3 - When installing a package from a Mercurial VCS URL, e.g. pip install hg+..., with pip prior to v23.3, the specified Mercurial revision could be used to inject arbitrary configuration options to the hg clone call (e.g. --config). Controlling the Mercurial configuration can modify how and which repository is installed. This vulnerability does not affect users who aren't installing from Mercurial. and upgrade the urllib3 requirements from 2.0.4 to 2.0.7: - urllib3's request body not stripped after redirect from 303 status changes request method to GET - `Cookie` HTTP header isn't stripped on cross-origin redirects And it's not like any of this looks like a big deal, but I'd like to shut up the messages I get. I can either just close those issues, or I can apply a patch something like the attached (which also adds a missing newline at the end). I thought I should ask the people who actually set this up. Comments? Linus drivers/gpu/drm/ci/xfails/requirements.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/ci/xfails/requirements.txt b/drivers/gpu/drm/ci/xfails/requirements.txt index d8856d1581fd..e9994c9db799 100644 --- a/drivers/gpu/drm/ci/xfails/requirements.txt +++ b/drivers/gpu/drm/ci/xfails/requirements.txt @@ -5,7 +5,7 @@ termcolor==2.3.0 certifi==2023.7.22 charset-normalizer==3.2.0 idna==3.4 -pip==23.2.1 +pip==23.3 python-gitlab==3.15.0 requests==2.31.0 requests-toolbelt==1.0.0 @@ -13,5 +13,5 @@ ruamel.yaml==0.17.32 ruamel.yaml.clib==0.2.7 setuptools==68.0.0 tenacity==8.2.3 -urllib3==2.0.4 -wheel==0.41.1 \ No newline at end of file +urllib3==2.0.7 +wheel==0.41.1
Re: [git pull] drm for 6.8
On Wed, 10 Jan 2024 at 11:49, Dave Airlie wrote: > > Let me know if there are any issues, Your testing is seriously lacking. This doesn't even build. The reason seems to be that commit b49e894c3fd8 ("drm/i915: Replace custom intel runtime_pm tracker with ref_tracker library") changed the 'intel_wakeref_t' type from a 'depot_stack_handle_t' to 'unsigned long', and as a result did this: - drm_dbg(&i915->drm, "async_put_wakeref %u\n", + drm_dbg(&i915->drm, "async_put_wakeref %lu\n", power_domains->async_put_wakeref); meanwhile, the Xe driver has this: drivers/gpu/drm/xe/compat-i915-headers/intel_wakeref.h: typedef bool intel_wakeref_t; which has never been valid, but now the build fails with drivers/gpu/drm/i915/display/intel_display_power.c: In function ‘print_async_put_domains_state’: drivers/gpu/drm/i915/display/intel_display_power.c:408:29: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Werror=format=] because the drm header files have this disgusting thing where a *header* file includes a *C* file: In file included from ./include/drm/drm_mm.h:51, from drivers/gpu/drm/xe/xe_bo_types.h:11, from drivers/gpu/drm/xe/xe_bo.h:11, from ./drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h:11, from ./drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h:15, from drivers/gpu/drm/i915/display/intel_display_power.c:8: nasty. I made it build by fixing that broken Xe compat header file, but this is definitely *NOT* how things should have worked. How did this ever get to me without any kind of build testing? And why the %^!@$% does a header file include a C file? That's wrong regardless of this bug. Linus
Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
On Mon, 22 Jan 2024 at 15:17, Steven Rostedt wrote: > > Perhaps this is the real fix? If you send a signed-off version, I'll apply it asap. Thanks, Linus
Re: [BUG] BUG: kernel NULL pointer dereference at ttm_device_init+0xb4
On Mon, 22 Jan 2024 at 16:56, Bhardwaj, Rajneesh wrote: > > I think a fix might already be in flight. Please see Linux-Kernel Archive: > Re: [PATCH] drm/ttm: fix ttm pool initialization for no-dma-device drivers > (iu.edu) Please use lore.kernel.org that doesn't corrupt whitespace in patches or lose header information: https://lore.kernel.org/lkml/20240113213347.9562-1-pchel...@ispras.ru/ although that seems to be a strange definition of "in flight". It was sent out 8 days ago, and apparently nobody thought to include it in the drm fixes pile that came in last Friday. So it made it into rc1, even though it was reported a week before. It also looks like some mailing list there is mangling emails - if you use 'all' instead of 'lkml', lore reports multiple emails with the same message-id, and it all looks messier as a result. I assume it's dri-devel@lists.freedesktop.org that messes up, mainly because I don't tend to see this behaviour when only the usual kernel.org mailing lists are involved. Linus
Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans
On Sun, 28 Jan 2024 at 11:36, David Laight wrote: > > However it generates: > error: comparison of constant ‘0’ with boolean expression is always true > [-Werror=bool-compare] > inside the signedness check that max() does unless a '+ 0' is added. Please fix your locale. You have random garbage characters there, presumably because you have some incorrect locale setting somewhere in your toolchain. Linus
Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans
On Sun, 28 Jan 2024 at 14:22, David Laight wrote: > > H blame gcc :-) I do agree that the gcc warning quoting is unnecessarily ugly (even just visually), but.. > The error message displays as '0' but is e2:80:98 30 e2:80:99 > I HATE UTF-8, it wouldn't be as bad if it were a bijection. No, that's not the problem. The UTF-8 that gcc emits is fine. And your email was also UTF-8: Content-Type: text/plain; charset=UTF-8 The problem is that you clearly then used some other tool in between that took the UTF-8 byte stream, and used it as (presumably) Latin1, which is bogus. If you just make everything use and stay as UTF-8, it all works out beautifully. But I suspect you have an editor or a MUA that is fixed in some 1980s mindset, and when you cut-and-pasted the UTF-8, it treated it as Latin1. Just make all your environment be utf-8, like it should be. It's not the 80s any more. We don't do mullets, and we don't do Latin1, ok? Linus
Re: [git pull] drm fixes for 6.5-rc4
On Thu, 27 Jul 2023 at 19:20, Dave Airlie wrote: > > Regular scheduled fixes, msm and amdgpu leading the way, with some > i915 and a single misc fbdev, all seems fine. Pulled. Tangentially related: where do you keep your pgp key? The one I have is long expired, and doing a refresh doesn't get any updates... Linus
Re: [GIT PULL] fbdev fixes and updates for v6.7-rc3
On Sat, 25 Nov 2023 at 22:58, Helge Deller wrote: > > please pull some small fbdev fixes for 6.7-rc3. These all seem to be pure cleanups, not bug fixes. Please resend during the merge window. Linus
Re: [git pull] drm fixes for 6.8
On Wed, 3 Jan 2024 at 18:30, Dave Airlie wrote: > > These were from over the holiday period, mainly i915, a couple of > qaic, bridge and an mgag200. > > I have a set of nouveau fixes that I'll send after this, that might be > too rich for you at this point. > > I expect there might also be some more regular fixes before 6.8, but > they should be minor. I'm assuming you're just confused about the numbering, and meant 6.7 here and in the subject line. This seems to be too small of a pull to be an early pull request for the 6.8 merge window. Linus
Re: [git pull] drm for 6.2-rc1
On Mon, Dec 12, 2022 at 6:56 PM Dave Airlie wrote: > > There are a bunch of conflicts, one in amdgpu is a bit nasty, I've > cc'ed Christian/Alex to make sure they know to check whatever > resolution you find. The one I have is what we have in drm-tip tree. Hmm. My merge resolution is slightly different from yours. You seem to have basically dropped commit b09d6acba1d9 ("drm/amdgpu: handle gang submit before VMID"). Now, there are other fence changes in the drm tree that may mean that that commit *should* be dropped, so it's entirely possible that my resolution which kept that ordering change might be wrong and your resolution that just took the drm tip code is the right one. Christian? Alex? Can you please double-check what I just pushed out? Linus
Re: [git pull] drm for 6.2-rc1
On Wed, Dec 14, 2022 at 12:05 AM Christian König wrote: > > Anyway we need to re-apply b09d6acba1d9 which should be trivial. Note that my resolution did exactly that (*), it's just that when I double-checked against Dave's suggested merge that I noticed I'd done things differently than he did. (*) Well, when I say "did exactly that" I don't actually know some of the other fencing changes that happened, so there may be a reason why something further should still be done. So I can only point to my merge commit a594533df0f6 and ask people to verify. It does all at least work for me. Knock wood. Linus
Re: [PATCH v1 1/1] treewide: Align match_string() with sysfs_match_string()
On Tue, 4 Jun 2024 at 11:25, Rodrigo Vivi wrote: > > (I believe that the new _match_string(str1, size, str2) deserves a better > name, > but since I'm bad with naming stuff, I don't have any good suggestion) I hated the enormous cc list, so I didn't reply to all. But clearly everybody else is just doing so. Anyway, here's my NAK for this patch with explanation: https://lore.kernel.org/all/CAHk-=wg5f99-gzpetsasjd0jb0jgcdmmpehrxctt4_i83h8...@mail.gmail.com/ and part of it was the naming, but there were other oddities there too. Linus
Re: [PATCH v2 05/10] mm/util: Fix possible race condition in kstrdup()
On Thu, 13 Jun 2024 at 14:14, Andrew Morton wrote: > > The concept sounds a little strange. If some code takes a copy of a > string while some other code is altering it, yes, the result will be a > mess. This is why get_task_comm() exists, and why it uses locking. The thing is, get_task_comm() is terminally broken. Nobody sane uses it, and sometimes it's literally _because_ it uses locking. Let's look at the numbers: - 39 uses of get_task_comm() - 2 uses of __get_task_comm() because the locking doesn't work - 447 uses of raw "current->comm" - 112 uses of raw 'ta*sk->comm' (and possibly IOW, we need to just accept the fact that nobody actually wants to use "get_task_comm()". It's a broken interface. It's inconvenient, and the locking makes it worse. Now, I'm not convinced that kstrdup() is what anybody should use should, but of the 600 "raw" uses of ->comm, four of them do seem to be kstrdup. Not great, I think they could be removed, but they are examples of people doing this. And I think it *would* be good to have the guarantee that yes, the kstrdup() result is always a proper string, even if it's used for unstable sources. Who knows what other unstable sources exist? I do suspect that most of the raw uses of 'xyz->comm' is for printouts. And I think we would be better with a '%pTSK' vsnprintf() format thing for that. Sadly, I don't think coccinelle can do the kinds of transforms that involve printf format strings. And no, a printk() string still couldn't use the locking version. Linus
Re: [git pull] drm main pull for 3.4-rc1
On Wed, Mar 21, 2012 at 3:47 AM, Dave Airlie wrote: > > (oh and any warnings you see in i915 are gcc's fault from what anyone can > see). Christ those are annoying. Has anybody contacted the gcc people about this? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm main pull for 3.4-rc1
Btw, I think this came in through the DRM merge: ERROR: "mdfld_set_brightness" [drivers/gpu/drm/gma500/gma500_gfx.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 this is just "make ARCH=i386" with allmodconfig. Linus On Wed, Mar 21, 2012 at 3:47 AM, Dave Airlie wrote: > > This is the main drm pull request, I'm probably going to send two more > smaller ones, will explain below. ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Linux 3.4-rc4
David & co, any ideas? There are other reports of problems with 3.3.x kernels, there's a report from Tim which may be related (also apparently working in 3.2, broken black screen in all 3.3.x). Nick, I realize you had trouble with a bisection already, but it might really be worth trying again. Do a git bisect visualize and try to pick a good commit (avoding the problems you hit) when you hit a problem, and then do git reset --hard to force bisection to try another place. That way you can sometimes avoid the problem spots, and continue the bisection. Linus On Sat, Apr 21, 2012 at 9:07 PM, Nick Bowler wrote: > On 2012-04-21 15:43 -0700, Linus Torvalds wrote: >> But none of it really looks all that scary. It looks like the 3.4 >> release is all on track, but please do holler if you see regressions. > > OK, I'll holler. Nouveau is still broken in mainline, as it has been > since March or so, first reported here: > > Subject: Regression: black screen on VGA w/ nouveau in Linux 3.3. > http://thread.gmane.org/gmane.linux.kernel/1269631 > > So far, attempts to elicit a response of any kind from maintainers has > been about as successful as talking to my doorknob. > > Thanks, > -- > Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/) > ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] mm: Work around Intel SNB GTT bug with some physical pages.
On Mon, May 7, 2012 at 4:13 PM, Stéphane Marchesin wrote: > > In the end, I came up with the ugly workaround of just leaking the > offending pages in shmem.c. Don't leak it. Instead, add it to some RCU list, and free it using RCU. Or some one-second timer or something. That kind of approach should guarantee that it (a) gets returned to the system but (b) the returning to the system gets delayed sufficiently that if the i915 driver is doing lots of allocations it will be getting other pages. Hmm? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
drm/i915 3.5 merge window: gen6_sanitize_pm errors
These guys seem to be recently introduced: [drm:gen6_sanitize_pm] *ERROR* Power management discrepancy: GEN6_RP_INTERRUPT_LIMITS expected 1700, was 1206 [drm:gen6_sanitize_pm] *ERROR* Power management discrepancy: GEN6_RP_INTERRUPT_LIMITS expected 1707, was 1700 This is on my SNB Macbook Air. Everything seems to *work*, which makes me think: - that error isn't really so big a deal that you have to *SHOUT* about it. - I wonder how valid the discrepancy checking code is to begin with. Hmm? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [GIT PULL]: dma-buf updates for 3.5
On Fri, May 25, 2012 at 2:17 AM, Sumit Semwal wrote: > > I am really sorry - I goofed up in the git URL (sent the ssh URL > instead). I was going to send you an acerbic email asking for your private ssh key, but then noticed that you had sent another email with the public version of the git tree.. > Could you please use > > git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git > tags/tag-for-linus-3.5 > > instead, or should I send a new pull request with the corrected URL? Done. However, while your tag seems to be signed, your key is not available publicly: [torvalds@i5 ~]$ gpg --recv-key 7126925D gpg: requesting key 7126925D from hkp server pgp.mit.edu gpgkeys: key 7126925D not found on keyserver so I can't check if it is signed by anybody. Please do something like gpg --keyserver pgp.mit.edu --send-keys 7126925D to actually make your public key public. Of course, if it isn't public, I assume it hasn't actually been signed by anybody, which makes it largely useless. But any future signing action will validate the pre-signing uses of the key, so that's fixable. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
i915: eDP hot-unplug errors
Oops. This got sent without the right Cc, and the wrong subject (the people who were *supposed* to be cc'd instead got into the subject line, and the subject line got dropped entirely). Linus On Sun, May 27, 2012 at 4:09 PM, Linus Torvalds wrote: > A new worry about excessively verbose i915 driver "errors" that don't > actually seem to be errors. > > I got myself a micro-DP to VGA adapter so that I can use my Macbook > Air for presentations. > > And testing it, hotplugging seems to work very nicely, and gone are > the days when you had to do xrandr and crap. Things "JustWork(tm)". > > Goodie. > > Until you start looking at the dmesg log. Then it gets ugly. It's full > of scary-looking stuff like > > [ cut here ] > WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 > intel_dp_check_edp+0x5d/0xb0() > Hardware name: MacBookAir4,1 > eDP powered off while attempting aux channel communication. > Modules linked in: fuse rfcomm bnep nf_conntrack_netbios_ns > nf_conntrack_broadcast .. > Pid: 2126, comm: kworker/0:0 Tainted: G W 3.4.0-08209-gae32adc #9 > Call Trace: > warn_slowpath_common+0x7a/0xb0 > warn_slowpath_fmt+0x41/0x50 > intel_dp_check_edp+0x5d/0xb0 > intel_dp_aux_ch+0x3f/0x330 > intel_dp_aux_native_read_retry+0xad/0x130 > intel_dp_detect+0x240/0x2c0 > output_poll_execute+0xba/0x1a0 > process_one_work+0x11b/0x3c0 > worker_thread+0x12e/0x2d0 > kthread+0x8e/0xa0 > kernel_thread_helper+0x4/0x10 > ---[ end trace 5756a4d08d9e2a83 ]--- > > which seems a bit extreme. I just unplugged the connector, I don't > think it should necessarily make quite this big a deal over it. What > does the WARN_ON() with the full call trace really buy us? > > Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: i915: GPU hung (F14, Intel Core i5-670)
On Mon, May 28, 2012 at 12:06 AM, Chris Wilson wrote: > > No, the i915_error_state had everything I needed to see. It is the old > ddx bug that was hardcoding a maximum relocation address that never > corresponded with an actual hw limit. As soon we try to use memory above > that value, the GPU decides not to listen to us any more. > > Fixed in xf86-video-intel 2.14.901 I really don't think that's the case. I have run the F14 X server for a *long* time without these issues on this machine, and today I now got a second GPU hang with the current git tree. I was in the middle of just writing an email in chrome, nothing fancy going on at all. So please please *please* take a second look. Because I think it's triggered by the i915 changes, or you undid a workaround that used to work fine. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: edp backtrace spam on MacBookAir4,1
On Wed, May 30, 2012 at 1:27 AM, Daniel Vetter wrote: > > Ok, Chris couldn't reproduce this on his mba. Can you please boot with > drm.debug=0xe, reproduce the noise and then attach the full dmesg? Hmm. Now *I* can't reproduce it either. I have updated my system in the meantime, so maybe this is related to that. However, I suspect it's more likely that it's some race condition, because when I got it, I got a *lot* of it, but they were all very tightly bunched together: [ 1588.996413] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1588.996650] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.000983] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.001225] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.005975] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.006218] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.010980] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.011224] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.015976] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.016211] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.020986] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() [ 1589.021232] WARNING: at drivers/gpu/drm/i915/intel_dp.c:350 intel_dp_check_edp+0x5d/0xb0() ie that's 12 of those warnings (each of them with that huge backtrace etc), but they are all within 0.03 seconds of each other. So I suspect it needs to hit some particular timing window, and I was just (un)lucky. Because when I try it now with DRM debugging, I can't hit it. And just to make sure, I re-did the test without debugging too (in case the debugging would have changed timing), but can't reproduce it that way either. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel