Re: drm pull for v5.3-rc1
On Mon, Jul 15, 2019 at 10:37 AM Linus Torvalds wrote: > > I'm not pulling this. Why did you merge it into your tree, when > apparently you were aware of how questionable it is judging by the drm > pull request. Looking at some of the fallout, I also see that you then added that "adjust apply_to_pfn_range interface for dropped token" patch that seems to be for easier merging of this all. But you remove the 'token' entirely in one place, and in another you keep it and just say "whatever, it's unused, pass in NULL". WHAA? As part of looking at this all, I also note that some of this is also very non-kernely. The whole thing with trying to implement a "closure" in C is simply not how we do things in the kernel (although I've admittedly seen signs of it in some drivers). If this should be done at all (and that's questionable), at least do it in the canonical kernel way: pass in a separate "actor" function pointer and an argument block, don't try to mix function pointers and argument data and call it a "closure". We try to keep data and functions separate. It's not even for security concerns (although those have caused some splits in the past - avoid putting function pointers in structures that you then can't mark read-only!), it's a more generic issue of just keeping arguments as arguments - even if you then make a structure of them in order to not make the calling convention very complicated. (Yes, we do have the pattern of sometimes mixing function pointers with "describing data", ie the "struct file_operations" structure isn't _just_ actual function pointers, it also contains the module owner, for example. But those aren't about mixing function pointers with their arguments, it's about basically "describing" an object interface with more than just the operation pointers). So some of this code is stuff that I would have let go if it was in some individual driver ("Closures? C doesn't have closures! But whatever - that driver writer came from some place that taught lamda calculus before techning C"). But in the core mm code, I want reviews. And I want the code to follow normal kernel conventions. Linus
Re: DRM pull for v5.3-rc1
[ Ugh, I have three different threads about the drm pull because of the subject / html confusion. So now I'm replying in separate threads and I'm hoping the people involved have better threading than gmail does ;/ ] On Mon, Jul 15, 2019 at 5:29 AM Jason Gunthorpe wrote: > > The 'hmm' tree is something I ran to try and help workflow issues like > this, as it could be merged to DRM as a topic branch - maybe consider > this flow in future? > > Linus, do you have any advice on how best to handle sharing mm > patches? I don't have a lot of advice except for "very very carefully". I think the hmm tree worked really well this merge window, at least from my standpoint. But it is of course possible that my happiness about the hmm tree is a complete fluke and came about because pretty much all the patches were removing oddities and cleaning things up, and they weren't adding new odd things (or if they were, you hid it better ;^). Basically, people should know that there are some subsystems that I end up being *much* more worried about than others. If I see a pull request that modifies core VM of VFS code, I tend to go into "careful mode", in ways that I don't do when some maintainer sends me a pull request that obviously only changes some subtree that the maintainer owns. When driver maintainers send me patches that touch their drivers, I look at the diffstat. But when driver maintainers send me patches that change mm/, I look at individual commit messages and the actual diff itself, and if I see contentious stuff, I simply won't pull. It's why I left the hmm pull request (which came in early - thank you) until yesterday, simply because just from the diffstat I could tell that "ok, this merge isn't just me merging and doing sanity checks, this is me having to set aside time to do reviews". So just from the diffstat, I avoided even looking at it while I still had a mailbox full of other pull requests. So I do like seeing core mm changes come in through a separate branch. That's partly because that way it's easier to review without having the parts I care about be mixed up with other parts (it's one reason I asked the security layer pulls to be split up, to pick another area entirely as an example). But it's also partly because if you have a separate branch for the stuff that you know is contentious, that doesn't then hold up the parts that _aren't_. For example, right now I'm not pulling _any_ of the drm updates, simply because there's a part of it that I can't stomach. It would have been so much nicer if the contentious things that need a lot of care separate, and I could have at least pulled the other stuff. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm pull for v5.3-rc1
On Mon, Jul 15, 2019 at 11:29 AM Dave Airlie wrote: > > Not that I want to defend that code, but the mm patch that conflicts > already shows that removing the token is fine as nobody needs or > requires it. So the fixup patch in my tree was just a bridge to that patch, > which reduces conflicts. Rip the token out of the new API, pass it as NULL > to the old API until the mm patch is merged against it which drops the > token from the old API. Well, to me the "old" API looks like a new one too, since it's that "struct page_range_apply" thing. But I can appreciate that it makes for minimal patch to avoid conflicts with other patches. It just doesn't look very sensible stand-alone afaik. I might be missing something. But that last patch is more of a detail - it wouldn't even exist if it wasn't for the earlier mm patches, and those are the ones that make me go more than "Whaa?" so it's not like this is really all that big of an issue. More of just a note I made while looking through the mm parts. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: DRM pull for v5.3-rc1
On Mon, Jul 15, 2019 at 11:16 AM Linus Torvalds wrote: > > On Mon, Jul 15, 2019 at 5:29 AM Jason Gunthorpe wrote: > > > > The 'hmm' tree is something I ran to try and help workflow issues like > > this, as it could be merged to DRM as a topic branch - maybe consider > > this flow in future? > > > > Linus, do you have any advice on how best to handle sharing mm > > patches? > > I don't have a lot of advice except for "very very carefully". > > I think the hmm tree worked really well this merge window, at least > from my standpoint. Side note: I suspect that having a separate branch maintained by a separate person actually does help the "very carefully" part. I think the hmm branch ended up getting more "incidental review" simply because of how it was done. So even if the original reason for the separate branch was to resolve some quilt/git integration issues, I would not be at all surprised if just the extra indirection through another person ended up making both the sending and receiving side think more about each patch and think more about the abstraction. The hmm branch didn't actually seem to have any of the core VM people reviewing it either, but it did have reviewers across companies for all the patches, and I do think that that makes a difference. It's _soo_ easy for a patch series to be developed inside one company by a couple of people who are probably in the same group, and have the exact same objectives, to be a lot more biased (and likely biased not towards the mm goals, but the goals of the code _outside_ the mm). This is just a long-winded way to say "I do think that the separate and external branch with multiple interested parties" can have some inherent advantages, when you actually have multiple people looking at it with care and intent. (And here the fact that you have multiple subsystems looking at the code is very much part of what makes it a good model - if it was just an external branch for one single user - the vmware gfx driver - you wouldn't get the same kind of advantages. So it's not the "external branch" part itself, it's the "multiple users who care" part that likely causes people to think more about the end result) Again - maybe I'm rationalizing, and the hmm branch just randomly happened to work well this time. I do like having multiple people from different groups look at things, though. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: DRM pull for v5.3-rc1
On Mon, Jul 15, 2019 at 12:17 PM Jason Gunthorpe wrote: > > About the only thing I could concretely suggest for working with -mm > is if there was some way the -mm quilt patches could participate in > 'git merge' resolution at your level. Andrew did make noises about having multiple git branches. We'll see. Linus
Re: drm pull for v5.3-rc1
On Mon, Jul 15, 2019 at 12:36 PM Thomas Hellström (VMware) wrote: > > - I've never had any kernel code more reviewed than this. Hmm. It may have been reviewed, but that wasn't visible in the commits themselves, so when I look at the pull request, I don't see that. > - The combined callback / argument struct: It was strongly inspired by > the struct mm_walk (mm.h), the page walk code being quite similar in > functionality. The mm_walk struct is indeed a bit similar, and is in fact a bit problematic exactly because it mixes function pointers with non-const data. I wish it had been a 'const struct mm_walk *" that only passed in the stuff that describes what to do on the walk itself. Or separated into two different pointers - one for the "this is what to do for the walk" and one for "this is the walking data". In fact, I think tight now that is actually _almost_ the case and we could make them const, except for "walk->vma" which is updated dynamically as we walk. Oh well. And for all I know, some of the walkers may be modifying their "private" field too, since that's left to the walkers. So yes, that one also has some problems, I agree. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm pull for v5.3-rc1
On Mon, Jul 15, 2019 at 1:07 PM Linus Torvalds wrote: > > The mm_walk struct is indeed a bit similar, and is in fact a bit > problematic exactly because it mixes function pointers with non-const > data. This made me look at how nasty that would be to fix. Not too bad. The attached patch does add more lines than it removes, but in most cases it's actually a clear improvement. It results in: - smaller stackframes and less runtime initialization: the bulk of the 'mm_walk' structure was the ops pointers, and if we split them out and make them const, we can just initialize them statically, and the stack footprint now becomes just a single word. - the function pointers are now nicely in a const data section in addition to the whole "don't mix variable data with constants, and don't put function pointers on the stack" thing. Of course, I haven't _tested_ the end result, but since it compiles it must be perfect, right? Not that I tested all of the build either, since several of the mm_walk users were for other architectures. I'm not sure this is really worth it, but I'm throwing the patch out there in case somebody wants to look. Andrew, comments? I don't think we have anybody who is in charge of mm_walk outside of you... Linus arch/openrisc/kernel/dma.c | 10 -- arch/powerpc/mm/book3s64/subpage_prot.c | 5 ++- arch/s390/mm/gmap.c | 25 ++ fs/proc/task_mmu.c | 40 +++--- include/linux/mm.h | 23 + mm/hmm.c| 34 ++- mm/madvise.c| 10 -- mm/memcontrol.c | 11 -- mm/mempolicy.c | 5 ++- mm/migrate.c| 20 +-- mm/mincore.c| 5 ++- mm/mprotect.c | 5 ++- mm/pagewalk.c | 60 +++-- 13 files changed, 165 insertions(+), 88 deletions(-) diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index b41a79fcdbd9..1a69a66fe257 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -80,8 +80,11 @@ arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, { unsigned long va; void *page; - struct mm_walk walk = { + static const struct mm_walk_ops ops = { .pte_entry = page_set_nocache, + }; + struct mm_walk walk = { + .ops = &ops, .mm = &init_mm }; @@ -111,8 +114,11 @@ arch_dma_free(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs) { unsigned long va = (unsigned long)vaddr; - struct mm_walk walk = { + static const struct mm_walk_ops ops = { .pte_entry = page_clear_nocache, + }; + struct mm_walk walk = { + .ops = &ops, .mm = &init_mm }; diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book3s64/subpage_prot.c index 9ba07e55c489..7876b316138b 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -143,9 +143,12 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, unsigned long len) { struct vm_area_struct *vma; + static const struct mm_walk_ops ops = { + .pmd_entry = subpage_walk_pmd_entry, + }; struct mm_walk subpage_proto_walk = { + .ops = &ops, .mm = mm, - .pmd_entry = subpage_walk_pmd_entry, }; /* diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 1e668b95e0c6..9e0feeb469c2 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2523,9 +2523,14 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start, static inline void zap_zero_pages(struct mm_struct *mm) { - struct mm_walk walk = { .pmd_entry = __zap_zero_pages }; + static const struct mm_walk_ops ops = { + .pmd_entry = __zap_zero_pages, + }; + struct mm_walk walk = { + .ops = &ops, + .mm = mm, + }; - walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); } @@ -2591,11 +2596,15 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, int s390_enable_skey(void) { - struct mm_walk walk = { + static const struct mm_walk_ops ops = { .hugetlb_entry = __s390_enable_skey_hugetlb, .pte_entry = __s390_enable_skey_pte, }; struct mm_struct *mm = current->mm; + struct mm_walk walk = { + .ops = &ops, + .mm = mm, + }; struct vm_area_struct *vma; int rc = 0; @@ -2614,7 +2623,6 @@ int s390_enable_skey(void) } mm->def_flags &= ~VM_MERGEABLE; - walk.mm = mm; walk_page_range(0, TASK_SIZE, &walk); out_up: @@ -2635,10 +2643,15 @@ static int __s390_reset_cmma(pte_t *pte, unsigned long addr, void s390_reset_cmma(struct mm_struct *mm) { - struct mm_walk walk = { .pte_entry = __s390_reset_cmma }; + static const struct mm_walk_ops ops = { +
Re: [git pull] drm pull for 5.3-rc1
On Mon, Jul 15, 2019 at 11:38 AM Dave Airlie wrote: > > The reason I was waiting for the HMM tree to land, is a single silent > merge conflict needs to be resolved after merging this as below. There's more than that there. For example, the DRM_AMDGPU_USERPTR config has a depends on ARCH_HAS_HMM select HMM_MIRROR and that won't work any more. The hmm tree changed the requirements to be depends on HMM_MIRROR instead. Now, arguably the hmm tree change in this respect is an annoyance - the old model was much more user-friendly where the drivers that wanted HMM would just select it when it was available. See commit 43535b0aefab ("mm: remove the HMM config option"). I've done the merge, but my tests are still on-going. And after I've finished those, I'll compare against your suggested merge to see if I missed anything in turn.. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PULL] drm-next fixes for -rc1
On Fri, Jul 19, 2019 at 8:43 AM Daniel Vetter wrote: > > drm fixes for -rc1: > > nouveau: > - bugfixes + TU116 enabling (minor iteration):w Asking the important question: - WTH is that ":w" thing? I have a theory that it's just a "I'm confused by 'vi'" marker. Very understandable. But I'm also entertaining the possibility that it's a new "whistling guy" emoticon. Or maybe a "hungry baby bird face" emoticon. Admittedly not a _great_ new addition to the emoticons I've seen, but hey, I'm not judging. Much. I left it in the merge message for posterity, regardless. Linus
Re: [PATCH] fbdev: Ditch fb_edid_add_monspecs
On Sun, Jul 21, 2019 at 1:20 PM Daniel Vetter wrote: > > It's dead code ever since Lovely. Ack. Linus
Re: [git pull] drm for 5.19-rc1
On Tue, Jun 7, 2022 at 3:23 AM Geert Uytterhoeven wrote: > > These header files are heavy users of large constants lacking the "U" > suffix e.g.: > > #define NB_ADAPTER_ID__SUBSYSTEM_ID_MASK 0xL As Andreas says, this is not undefined behavior. A hexadecimal integer constant will always get a type that fits the actual value. So on a 32-bit architecture, because 0x doesn't fit in 'long', it will automatically become 'unsigned long'. Now, a C compiler might still warn about such implicit type conversions, but I'd be a bit surprised if any version of gcc actually would do that, because this behavior for hex constants is *very* traditional, and very common. It's also true that the type of the constant - but not the value - will be different on 32-bit and 64-bit architectures (ie on 64-bit, it will be plain "long" and never extended to "unsigned long", because the hex value obviously fits just fine). I don't see any normal situation where that really matters, since any normal use will have the same result. The case you point to at https://lore.kernel.org/r/cak8p3a0qrihbr_2fq7uz5w2jmljv7czfrrarcmmjohvndj3...@mail.gmail.com is very different, because the constant "1" is always just a plain signed "int". So when you do "(1 << 31)", that is now a signed integer with the top bit set, and so it will have an actual negative value, and that can cause various problems (when right-shifted, or when compared to other values). But hexadecimal constants can be signed types, but they never have negative values. Linus
Re: Report in ata_scsi_port_error_handler()
On Tue, Feb 15, 2022 at 10:37 PM Damien Le Moal wrote: > > On 2/16/22 13:16, Byungchul Park wrote: > > [2.051040] === > > [2.051406] DEPT: Circular dependency has been detected. > > [2.051730] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: GW > > [2.051991] --- > > [2.051991] summary > > [2.051991] --- > > [2.051991] *** DEADLOCK *** > > [2.051991] > > [2.051991] context A > > [2.051991] [S] (unknown)(&(&ap->eh_wait_q)->dmap:0) > > [2.051991] [W] __raw_spin_lock_irq(&host->lock:0) > > [2.051991] [E] event(&(&ap->eh_wait_q)->dmap:0) > > [2.051991] > > [2.051991] context B > > [2.051991] [S] __raw_spin_lock_irqsave(&host->lock:0) > > [2.051991] [W] wait(&(&ap->eh_wait_q)->dmap:0) > > [2.051991] [E] spin_unlock(&host->lock:0) > > Sleeping with a spinlock held would be triggering warnings already, so > these reports seem bogus to me. Yeah, Matthew pointed out the same thing for another use-case, where it looks like DEPT is looking at the state at the wrong point (not at the scheduling point, but at prepare_to_sleep()). This ata_port_wait() is the exact same pattern, ie we have spin_lock_irqsave(ap->lock, flags); while (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) { prepare_to_wait(&ap->eh_wait_q, &wait, TASK_UNINTERRUPTIBLE); spin_unlock_irqrestore(ap->lock, flags); schedule(); and DEPT has incorrectly taken it to mean that 'ap->lock' is held during the wait, when it is actually released before actually waiting. For the spin-locks, this is all very obvious (because they'd have been caught long ago by much simpler debug code), but the same prepare_to_wait -> wait pattern can most definitely happen with sleeping locks too, so they are all slightly suspect. And yes, the detailed reports are hard to read because the locations are given as "ata_port_wait_eh+0x52/0xc0". Running them through scripts/decode_stacktrace.sh to turn them into filename and line numbers - and also sort out inlining - would help a lot. Byungchul, could you fix those two issues? Some of your reports may well be entirely valid, but the hard-to-read hex offsets and the knowledge that at least some of them are confused about how prepare_to_wait -> wait actually works makes the motivation to look at the details much less.. Linus
Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)
On Wed, May 4, 2022 at 1:19 AM Byungchul Park wrote: > > Hi Linus and folks, > > I've been developing a tool for detecting deadlock possibilities by > tracking wait/event rather than lock(?) acquisition order to try to > cover all synchonization machanisms. So what is the actual status of reports these days? Last time I looked at some reports, it gave a lot of false positives due to mis-understanding prepare_to_sleep(). For this all to make sense, it would need to not have false positives (or at least a very small number of them together with a way to sanely get rid of them), and also have a track record of finding things that lockdep doesn't. Maybe such reports have been sent out with the current situation, and I haven't seen them. Linus
Re: [git pull] drm for 5.20/6.0
On Tue, Aug 2, 2022 at 10:38 PM Dave Airlie wrote: > > This is a conflicty one. The late revert in 5.19 of the amdgpu buddy > allocator causes major conflict fallout. The buddy allocator code in > this one works, so the resolutions are usually just to take stuff from > this. It might actually be cleaner if you revert > 925b6e59138cefa47275c67891c65d48d3266d57 (Revert "drm/amdgpu: add drm > buddy support to amdgpu") first in your tree then merge this. Ugh, what a pain. The other conflicts are also due to just randomly duplicated commits, with *usually* your drm tree having the superset (so "just take yours" is the easy resolution), but not always (ie the Intel firmware-69 mess was apparently not dealt with in the development tree). Nasty. I think I have it resolved, am still doing a full build test, and will then compare against what your suggested merge is. Linus
Re: [git pull] drm for 5.20/6.0
On Wed, Aug 3, 2022 at 7:46 PM Linus Torvalds wrote: > > I think I have it resolved, am still doing a full build test, and will > then compare against what your suggested merge is. Hmm. I end up with *almost* the same thing. Except I ended up with a select DRM_BUDDY for the DRM_AMDGPU config entry, and you don't have that. I *think* my version is correct, in that clearly the amdgpu driver now uses that buddy logic (just doing a random "grep drm_buddy_block" to see). But this was messy enough to resolve that I think people should double-check my end, and maybe I just got confused at some point in the process. And while I seem to have gotten the same result as you did on the i915 firmware side too, again, I'd like people to re-verify. Linus
Re: [git pull] drm for 5.20/6.0
On Wed, Aug 3, 2022 at 8:37 PM Dave Airlie wrote: > > Actually I did miss that so that looks good. .. I wish it did, but I just actually test-booted my desktop with the result, and it crashes the X server. This seems to be the splat in Xorg.0.log: (II) Initializing extension DRI2 (II) AMDGPU(0): Setting screen physical size to 2032 x 571 (EE) (EE) Backtrace: (EE) 0: /usr/libexec/Xorg (OsLookupColor+0x13d) [0x55b1dc61258d] (EE) 1: /lib64/libc.so.6 (__sigaction+0x50) [0x7f7972a3ea70] (EE) 2: /usr/lib64/xorg/modules/drivers/amdgpu_drv.so (AMDGPUCreateWindow_oneshot+0x101) [0x7f797207ddd1] (EE) 3: /usr/libexec/Xorg (compIsAlternateVisual+0xdc4) [0x55b1dc545fa4] (EE) 4: /usr/libexec/Xorg (InitRootWindow+0x17) [0x55b1dc4e0047] (EE) 5: /usr/libexec/Xorg (miPutImage+0xd4c) [0x55b1dc49e60b] (EE) 6: /lib64/libc.so.6 (__libc_start_call_main+0x80) [0x7f7972a29550] (EE) 7: /lib64/libc.so.6 (__libc_start_main+0x89) [0x7f7972a29609] (EE) 8: /usr/libexec/Xorg (_start+0x25) [0x55b1dc49f2c5] (EE) (EE) Segmentation fault at address 0x4 (EE) Fatal server error: (EE) Caught signal 11 (Segmentation fault). Server aborting so something is going horribly wrong. No kernel oops, though. It works on my intel laptop, so it's amdgpu somewhere. I guess I will start bisecting. Oy vey. Linus
Re: [git pull] drm for 5.20/6.0
On Wed, Aug 3, 2022 at 8:53 PM Dave Airlie wrote: > > > It works on my intel laptop, so it's amdgpu somewhere. > > I'll spin my ryzen up to see if I can reproduce, and test against the > drm-next pre-merge tree as well. So it's not my merge - I've had a bad result in the middle of the DRM history too. On a positive note, my arm64 machine works fine, but that's just using fbdev so ... But another datapoint to say that it's amdgpu-specific. Not that that was really in doubt. Linus
Re: [git pull] drm for 5.20/6.0
On Wed, Aug 3, 2022 at 9:24 PM Dave Airlie wrote: > > I've reproduced it, I'll send you a revert pile when I confirm it is > the buddy allocator. I've bisected it to 86bd6706c404..074293dd9f61 and don't see "buddy" in any of those commits. I'll do a few more. It's close enough already that it should be just four more reboots to pinpoint exactly which commit breaks. Linus
Re: [git pull] drm for 5.20/6.0
On Wed, Aug 3, 2022 at 9:27 PM Linus Torvalds wrote: > > I'll do a few more. It's close enough already that it should be just > four more reboots to pinpoint exactly which commit breaks. commit 5d945cbcd4b16a29d6470a80dfb19738f9a4319f is the first bad commit. I think it's supposed to make no semantic changes, but it clearly does. What a pain to figure out what's wrong in there, and I assume it doesn't revert cleanly either. Bringing in the guilty parties. See https://lore.kernel.org/all/CAHk-=wj+yzaunxiewhfcrkbdlsqkizdr1q3yjlaqpo6avq2...@mail.gmail.com/ for the beginning of this thread. Linus
Re: mainline build failure due to 6fdd2077ec03 ("drm/amd/amdgpu: add memory training support for PSP_V13")
On Thu, Aug 4, 2022 at 12:35 AM Sudip Mukherjee (Codethink) wrote: > > I will be happy to test any patch or provide any extra log if needed. It sounds like that file just needs to get a #include there, and for some reason architectures other than alpha and mips end up getting it accidentally through other headers. Mind testing just adding that header file, and perhaps even sending a patch if (when) that works for you? Linus
Re: [PATCH] drm/amd/display: restore plane with no modifiers code.
On Wed, Aug 3, 2022 at 10:50 PM Dave Airlie wrote: > > With this applied, I get gdm back. I can confirm that it makes thing work again for me too. Applied. Linus
Re: mainline build failure for x86_64 allmodconfig with clang
On Thu, Aug 4, 2022 at 11:37 AM Sudip Mukherjee (Codethink) wrote: > > git bisect points to 3876a8b5e241 ("drm/amd/display: Enable building new > display engine with KCOV enabled"). Ahh. So that was presumably why it was disabled before - because it presumably does disgusting things that make KCOV generate even bigger stack frames than it already has. Those functions do seem to have fairly big stack footprints already (I didn't try to look into why, I assume it's partly due to aggressive inlining, and probably some automatic structures on stack). But gcc doesn't seem to make it all that much worse with KCOV (and my clang build doesn't enable KCOV). So it's presumably some KCOV-vs-clang thing. Nathan? Linus
Re: mainline build failure for x86_64 allmodconfig with clang
On Thu, Aug 4, 2022 at 1:43 PM Nathan Chancellor wrote: > > I do note that commit 1b54a0121dba ("drm/amd/display: Reduce stack size > in the mode support function") did have a workaround for GCC. It appears > clang will still inline mode_support_configuration(). If I mark it as > 'noinline', the warning disappears in that file. That sounds like probably the best option for now. Gcc does not inline that function (at least for allmodconfig builds in my testing), so if that makes clang match what gcc does, it seems a reasonable thing to do. Linus
Re: mainline build failure for x86_64 allmodconfig with clang
On Sun, Aug 7, 2022 at 10:36 AM David Laight wrote: > > Or just shoot the software engineer who thinks 100 arguments > is sane. :-) I suspect the issue is that it's not primarily a software engineer who wrote that code. Hardware people writing code are about as scary as software engineers with a soldering iron. Linus
drm pull request (was Re: )
On Thu, May 5, 2022 at 9:07 PM Dave Airlie wrote: > > pretty quiet week, one fbdev, msm, kconfig, and 2 amdgpu fixes, about > what I'd expect for rc6. You're not getting the automated pr-tracker-bot response, because your subject line was missing... Just a "how did that happen" together with a "here's the manual response instead". Linus
Re: Adding CI results to the kernel tree was Re: [RFC v2] drm/msm: Add initial ci/ subdirectory
On Tue, May 10, 2022 at 10:07 PM Dave Airlie wrote: > > > And use it to store expectations about what the drm/msm driver is > > supposed to pass in the IGT test suite. > > I wanted to loop in Linus/Greg to see if there are any issues raised > by adding CI results file to the tree in their minds, or if any other > subsystem has done this already, and it's all fine. > > I think this is a good thing after our Mesa experience, but Mesa has a > lot tighter integration here, so I want to get some more opinions > outside the group. Honestly, my immediate reaction is that I think it might be ok, but (a) are these things going to absolutely balloon over time? (b) should these not be separated out? Those two issues kind of interact. If it's a small and targeted test-suite, by all means keep it in the kernel, but why not make it part of "tools/testing/selftests" But if people expect this to balloon and we end up having megabytes of test output, then I really think it should be a separate git tree. A diffstat like this: > 7 files changed, 791 insertions(+) is not a problem at all. But I get the feeling that this is just the tip of the iceberg, and people will want to not just have the result files, but start adding actual *input* files that may be largely automated stuff and may be tens of megabytes in size. Because the result files on their own aren't really self-contained, and then people will want to keep them in sync with the test-files themselves, and start adding those, and now it *really* is likely very unwieldy. Or if that doesn't happen, and the actual input test files stay in a separate CI repo, and then you end up having random coherency issues with that CI repo, and it all gets to be either horribly messy, or the result files in the kernel end up really stale. So honestly, I personally don't see a good end result here. This particular small patch? *This* one looks fine to me, except I really think tools/testing/selftests/gpu would be a much more logical place for it. But I don't see a way forward that is sane. Can somebody argue otherwise? Linus
Re: Adding CI results to the kernel tree was Re: [RFC v2] drm/msm: Add initial ci/ subdirectory
On Wed, May 11, 2022 at 11:40 AM Rob Clark wrote: > > It is missing in this revision of the RFC, but the intention is to > have the gitlab-ci.yml point to a specific commit SHA in the > gfx-ci/drm-ci[1] tree, to solve the problem of keeping the results in > sync with the expectations. Ie. a kernel commit would control moving > to a new version of i-g-t (and eventually deqp and/or piglit), and at > the same time make any necessary updates in the expectations files. Wouldn't it then be better to just have the expectation files in the ci tree too? The kernel tree might have just the expected *failures* listed, if there are any. Presumably the ci tree has to have the expected results anyway, so what's the advantage of listing non-failures? Linus
Re: Adding CI results to the kernel tree was Re: [RFC v2] drm/msm: Add initial ci/ subdirectory
On Wed, May 11, 2022 at 12:08 PM Linus Torvalds wrote: > > The kernel tree might have just the expected *failures* listed, if > there are any. Presumably the ci tree has to have the expected results > anyway, so what's the advantage of listing non-failures? .. put another way: I think a list of "we are aware that these currently fail" is quite reasonable for a development tree, maybe even with a comment in the commit that created them about why they currently fail. That also ends up being very nice if you fix a problem, and the fix commit might then remove the failure for the list, and that all makes perfect sense. But having just the raw output of "these are the expected CI results" that is being done and specified by some other tree entirely - that seems pointless and just noise to me. There's no actual reason to have that kind of noise - and update that kind of noise - that I really see. Linus
Re: [git pull] drm for 5.19-rc1
On Tue, May 24, 2022 at 11:07 PM Dave Airlie wrote: > > AMD has started some new GPU support [...] Oh Christ. Which means another set of auto-generated monster headers. Lovely. Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Fri, May 27, 2022 at 2:07 AM Sudip Mukherjee wrote: > > declared with attribute error: BUILD_BUG_ON failed: sizeof(*edid) != > EDID_LENGTH > > And, reverting it on top of mainline branch has fixed the build failure. Hmm. That BUILD_BUG_ON() looks entirely correct, and if that sizeof() doesn't work, then the code doesn't work. I'm not seeing what could go wrong in there, with all the structures I see being marked as __packed__. I wonder if the union in 'struct detailed_timing' also wants that "__attribute__((packed))" but I also wonder what it is that would make this fail on spear3xx but not elsewhere. Very strange. It would be interesting to know where that sizeof goes wrong, but it would seem to be something very peculiar to your build environment (either that config, or your compiler). Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Fri, May 27, 2022 at 4:41 PM Sudip Mukherjee wrote: > > I just tested with various values, sizeof(*edid) is 144 bytes at that place. Hmm. What compiler do you have? Because it seems very broken. You don't actually have to try with various sizes, you could have just done something like int size_of_edid(const struct edid *edid) { return sizeof(*edid); } and then "make drivers/gpu/drm/drm_edid.s" to generate assembly and see what it looks like (obviously removing the BUG_ON() in order to build). That obviously generates code like movl$128, %eax ret for me, and looking at the definition of that type I really can't see how it would ever generate anything else. But it's apparently not even close for you. I suspect some of the structs inside of that 'struct edid' end up getting aligned, despite the '__attribute__((packed))'. For example, 'struct est_timings' is supposed to be just 3 bytes, and it's at an odd offset too (byte offset 35 in the 'struct edid' if I did the math correctly). But it obviously doesn't happen for me or for most other people, so it's something in your setup. Unusual compiler? Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Sat, May 28, 2022 at 5:13 AM Sudip Mukherjee wrote: > > just tried this with > make ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- drivers/gpu/drm/drm_edid.s > > size_of_edid: > mov r0, #144@, > ldmfd sp, {fp, sp, pc}@ So digging a bit deeper - since I have am arm compiler after all - I note that 'sizeof(detailed_timings)' is 88. Which is completely wrong. It should be 72 bytes (an array of 4 structures, each 18 bytes in size). I have not dug deeper, but that is clearly the issue. Now, why that only happens on that spear3xx_defconfig, I have no idea. Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Sat, May 28, 2022 at 10:40 AM Linus Torvalds wrote: > > So digging a bit deeper - since I have am arm compiler after all - I > note that 'sizeof(detailed_timings)' is 88. Hmm. sizeof() both detailed_timings[0].data.other_data.data.range.formula.gtf2 and detailed_timings[0].data.other_data.data.range.formula.cvt is 7. But the *union* of those things is detailed_timings[0].data.other_data.data.range.formula and its size is 8 (despite having an alignment of just 1). The attached patch would seem to fix it for me. Not very much tested, and I have no idea what it is that triggers this only on spear3xx_defconfig. Some ARM ABI issue that is triggered by some very particular ARM compiler flag enabled by that config? Adding some ARM (and SPEAR, and SoC) people in case they have any idea. This smells like a compiler bug triggered by "there's a 16-bit member field in that gtf2 structure, and despite it being packed and aligned to 1, we somehow still align the size to 2". I dunno. But marking those unions packed too doesn't seem wrong, and does seem to fix it. Linus include/drm/drm_edid.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h index c3204a58fb09..b2756753370b 100644 --- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -121,7 +121,7 @@ struct detailed_data_monitor_range { u8 supported_scalings; u8 preferred_refresh; } __attribute__((packed)) cvt; - } formula; + } __attribute__((packed)) formula; } __attribute__((packed)); struct detailed_data_wpindex { @@ -154,7 +154,7 @@ struct detailed_non_pixel { struct detailed_data_wpindex color; struct std_timing timings[6]; struct cvt_timing cvt[4]; - } data; + } __attribute__((packed)) data; } __attribute__((packed)); #define EDID_DETAIL_EST_TIMINGS 0xf7 @@ -172,7 +172,7 @@ struct detailed_timing { union { struct detailed_pixel_timing pixel_data; struct detailed_non_pixel other_data; - } data; + } __attribute__((packed)) data; } __attribute__((packed)); #define DRM_EDID_INPUT_SERRATION_VSYNC (1 << 0)
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Sat, May 28, 2022 at 11:59 AM Arnd Bergmann wrote: > > It's CONFIG_ARM_AEABI, which is normally set everywhere. Without this > option, you the kernel is built for the old 'OABI' that forces all non-packed > struct members to be at least 16-bit aligned. Looks like forced word (32 bit) alignment to me. I wonder how many other structures that messes up, but I committed the EDID fix for now. This has presumably been broken for a long time, but maybe the affected targets don't typically use EDID and kernel modesetting, and only use some fixed display setup instead. Those structure definitions go back a _loong_ time (from a quick 'git blame' I see November 2008). But despite that, I did not mark my fix 'cc:stable' because I don't know if any of those machines affected by this bad arm ABI issue could possibly care. At least my tree hopefully now builds on them, with the BUILD_BUG_ON() that uncovered this. Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Tue, May 31, 2022 at 1:04 AM Arnd Bergmann wrote: > > As an experiment: what kind of results would we get when looking > for packed structures and unions that contain any of these: Yeah, any atomics or locks should always be aligned, and won't even work (or might be *very* slow) on multiple architectures. Even x86 - which does very well on unaligned data - reacts badly to sufficiently unaligned atomics (ie cacheline crossing). I don't think we have that. Not only because it would already cause breakage, but simply because the kinds of structures that people pack aren't generally the kind that contain these kinds of things. That said, you might have a struct that is packed, but that intentionally aligns parts of itself, so it *could* be valid. But it would probably not be a bad idea to check that packed structures/unions don't have atomic types or locks in them. I _think_ we're all good, but who knows.. Linus
Re: mainline build failure due to f1e4c916f97f ("drm/edid: add EDID block count and size helpers")
On Wed, Jun 1, 2022 at 3:28 PM Keisuke Nishimura wrote: > > > I found 13 definitions of packed structure that contains: > > - spinlock_t > > - atomic_t > > - dma_addr_t > > - phys_addr_t > > - size_t > > - struct mutex > > - struct device > > - raw_spinlock_t Ok, so I don't think dma_addr_t/phys_addr_t/size_t are problematic, they are just regular integers. And 'struct device' is problematic only as it then contains any of the atomic types (which it does) > security/tomoyo/common.h: atomic_t in tomoyo_shared_acl_head > drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h: spinlock_t in > key_map > arch/s390/include/asm/kvm_host.h: atomic_t in kvm_s390_sie_block So these do look problematic. I'm not actually clear on why tomoyo_shared_acl_head would be packed. That makes no sense to me. Same goes for key_map, it's not clear what the reason for that __packed is, and it's clearly bogus. It might work, almost by mistake, but it's wrong to try to pack that spinlock_t. The s390 kvm use actually looks fine: the structure is packed, but it's also aligned, and the spin-lock is at the beginning, so the "packing" part is about the other members, not the first one. The two that look wrong look like they will probably work anyway (they'll presumably be effectively word-aligned, and that's sufficient for spinlocks in practice). But let's cc the tomoyo and chelsio people. Linus
Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
So this has been going on for a while, and it's quite annoying. At bootup, my main desktop (Threadripper 3970X with radeon graphics) now complains about resource sanity check: requesting [mem 0xd000-0xdfff], which spans more than BOOTFB [mem 0xd000-0xd02f] and then gives me a nasty callchain that is basically the amdgpu probe sequence ending in amdgpu_bo_init() doing the arch_io_reserve_memtype_wc() which is then what complains. That "BOOTFB" resource is from sysfb_simplefb.c, and I think what started triggering this is commit c96898342c38 ("drivers/firmware: Don't mark as busy the simple-framebuffer IO resource"). Because it turns out that that removed the IORESOURCE_BUSY, which in turn is what makes the resource conflict code complain about it now, because /* * if a resource is "BUSY", it's not a hardware resource * but a driver mapping of such a resource; we don't want * to warn for those; some drivers legitimately map only * partial hardware resources. (example: vesafb) */ so the issue is that now the resource code - correctly - says "hey, you have *two* conflicting driver mappings". And that commit claims it did it because "which can lead to drivers requesting the same memory resource to fail", but - once again - the link in the commit that might actually tell more is just one of those useless patch submission links again. So who knows why that commit was actually done, but it's causing annoyance. If simplefb is actually still using that frame buffer, it's a problem. If it isn't, then maybe that resource should have been released? I really think that commit c96898342c38 is buggy. It talks about "let drivers to request it as busy instead", but then it registers a resource that isn't actually a proper real resource. It's just a random incomplete chunk of the actual real thing, so it will still interfere with resource allocation, and in fact now interferes even with that "set memtype" because of this valid warning. Linus
Re: Annoying AMDGPU boot-time warning due to simplefb / amdgpu resource clash
On Mon, Jun 27, 2022 at 1:02 AM Javier Martinez Canillas wrote: > > The flag was dropped because it was causing drivers that requested their > memory resource with pci_request_region() to fail with -EBUSY (e.g: the > vmwgfx driver): > > https://www.spinics.net/lists/dri-devel/msg329672.html See, *that* link would have been useful in the commit. Rather than the useless link it has. Anyway, removing the busy bit just made things worse. > > If simplefb is actually still using that frame buffer, it's a problem. > > If it isn't, then maybe that resource should have been released? > > It's supposed to be released once amdgpu asks for conflicting framebuffers > to be removed calling drm_aperture_remove_conflicting_pci_framebuffers(). That most definitely doesn't happen. This is on a running system: [torvalds@ryzen linux]$ cat /proc/iomem | grep BOOTFB - : BOOTFB so I suspect that the BUSY bit was never the problem - even for vmwgfx). The problem was that simplefb doesn't remove its resource. Guys, the *reason* for resource management is to catch people that trample over each other's resources. You literally basically disabled the code that checked for it by removing the BUSY flag, and just continued to have conflicting resources. That isn't a "fix", that is literally "we are ignoring and breaking the whole reason that the resource tree exists, but we'll still use it for no good reason". Yeah, yeah, most modern drivers ignore the IO resource tree, because they end up working on another resource level entirely: they work on not the IO resources, but on the "driver level" instead, and just attach to PCI devices. So these days, few enough drivers even care about the IO resource tree, and it's mostly used for (a) legacy devices (think ISA) and (b) the actual bus resource handling (so the PCI code itself uses it to sort out resource use and avoid conflicts, but PCI drivers themselves generally then don't care, because the bus has "taken care of it". So that's why the amdgpu driver itself doesn't care about resource allocations, and we only get a warning for that memory type case, not for any deeper resource case. And apparently the vmwgfx driver still uses that legacy "let's claim all PCI resources in the resource tree" instead of just claiming the device itself. Which is why it hit this whole BOOTFB resource thing even harder. But the real bug is that BOOTFB seems to claim this resource even after it is done with it and other drivers want to take over. Not the BUSY bit. Linus
Re: [git pull] drm fixes for 5.18-rc2
On Thu, Apr 7, 2022 at 2:20 PM Dave Airlie wrote: > > I think this should fix the amdgpu splat you have been seeing since rc1. Not the machine I'm currently traveling with, but I'll double-check when I'm back home. Thanks, Linus
Re: [git pull] drm fixes for 5.18-rc3
On Thu, Apr 14, 2022 at 2:33 PM Dave Airlie wrote: > > git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2022-04-15 .. and since I'm now back home, I can confirm that my boot-time splat I saw before is all gone. It presumably went away with the previous pull request already, but I hadn't remembered to check the issue until this pull reminded me about the issue. Thanks, Linus
Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation
On Wed, Sep 28, 2022 at 1:15 AM Gwan-gyeong Mun wrote: > > + if (check_assign(obj->base.size >> PAGE_SHIFT, &npages)) > + return -E2BIG; I have to say, I find that new "check_assign()" macro use to be disgusting. It's one thing to check for overflows. It's another thing entirely to just assign something to a local variable. This disgusting "let's check and assign" needs to die. It makes the code a completely unreadable mess. The "user" wersion is even worse. If you worry about overflow, then use a mix of (a) use a sufficiently large type to begin with (b) check for value range separately and in this particular case, I also suspect that the whole range check should have been somewhere else entirely - at the original creation of that "obj" structure, not at one random end-point where it is used. In other words, THIS WHOLE PATCH is just end-points checking the size requirements of that "base.size" thing much too late, when it should have been checked originally for some "maximum acceptable base size" instead. And that "maximum acceptable base size" should *not* be about "this is the size of the variables we use". It should be a sanity check of "this value is sane and fits in sane use cases". Because "let's plug security checks" is most definitely not about picking random assignments and saying "let's check this one". It's about trying to catch things earlier than that. Kees, you need to reign in the craziness in overflow.h. Linus
Re: [git pull] drm fixes for 5.19-rc7
On Fri, Jul 15, 2022 at 2:09 PM Nathan Chancellor wrote: > > On Fri, Jul 15, 2022 at 01:36:17PM +1000, Dave Airlie wrote: > > Matthew Auld (1): > > drm/i915/ttm: fix sg_table construction > > This patch breaks i386_defconfig with both GCC and clang: > > ld: drivers/gpu/drm/i915/i915_scatterlist.o: in function > `i915_rsgt_from_mm_node': > i915_scatterlist.c:(.text+0x1a7): undefined reference to `__udivdi3' Yeah, we definitely don't want arbitrary 64x64 divides in the kernel, and the fact that we don't include libgcc.a once again caught this horrid issue. The offending code is if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages), GFP_KERNEL)) { and I have to say that all of those games with "u64 page_alignment" that commit aff1e0b09b54 ("drm/i915/ttm: fix sg_table construction") introduces are absolutely disgusting. And they are just *pointlessly* disgusting. Why is that "page_alignment" a "u64"? And why is it a "size", instead of being a "number of bits"? The code literally does things like const u64 max_segment = round_down(UINT_MAX, page_alignment); which means that (a) page_alignment must be a power-of-two for this to work (round_down() only works in powers of two) (b) the result obviously must fit in an "unsigned int", since it's rounding down a UINT_MAX! So (a) makes it doubtful that "page_alignment" should have been a value (as opposed to mask), and (b) then questions why was that made an "u64" value when it cannot have a u64 range? And if max_segments cannot have a 64-bit range, then segment_pages here: u64 segment_pages = max_segment >> PAGE_SHIFT; sure cannot. Fixing those then uncovers other things: len = min(block_size, max_segment - sg->length); now complains about mixing types ("max_segment - sg->length" being u32), because 'block_size' is 64, bit, and that does seem to make some amount of sense: block_size = node->size << PAGE_SHIFT; with the 'node->size' being from drm_mm_node, and that size is a 'u64'. That I *could* see being more than 32 bits on a 64-bit architecture. Ok. But then that means that 'len' cannot be a 64-bit value either, and it should probably have been u32 len; .. len = min_t(u64, block_size, max_segment - sg->length); and that would just have been a lot nicer on 32-bit x86, avoiding a lot of pointlessly 64-bit things. That said, even those type simplifications do not fix the fundamental issue. That "DIV_ROUND_UP()" still ends up being a 64-bit divide, although now it's at least a "64-by-32" bit divide. Which needs to be handled by "do_div()" rather than the generic DIV_ROUND_UP() helper, because sadly, at least gcc still generates a full __udivdi3() even for the 64-by-32 divides. Can Intel GPU people please take a look? Linus
Re: [git pull] drm fixes for 5.19-rc7
On Sat, Jul 16, 2022 at 2:35 PM Linus Torvalds wrote: > > That said, even those type simplifications do not fix the fundamental > issue. That "DIV_ROUND_UP()" still ends up being a 64-bit divide, > although now it's at least a "64-by-32" bit divide. Hmm. The "DIV_ROUND_UP()" issue could be solved by just making the rule be that the max_segment size is always a power of two. Then you don't need the (expensive!) DIV_ROUND_UP(), and can just use the regular "round_up()" that works on powers-of-two. And the simplest way to do that is to just make "max_segments" be 2GB. The whole "round_down(UINT_MAX, page_alignment)" seems entirely pointless. Do you really want segments that are some odd number just under the 4GB mark, and force expensive divides? For consistency, I used the same value in i915_rsgt_from_buddy_resource(). I have no idea if that makes sense. Anyway, the attached patch is COMPLETELY UNTESTED. But it at least seems to compile. Maybe. Linus drivers/gpu/drm/i915/i915_scatterlist.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_scatterlist.c b/drivers/gpu/drm/i915/i915_scatterlist.c index f63b50b71e10..830dcd833d4e 100644 --- a/drivers/gpu/drm/i915/i915_scatterlist.c +++ b/drivers/gpu/drm/i915/i915_scatterlist.c @@ -81,7 +81,9 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node, u64 region_start, u64 page_alignment) { - const u64 max_segment = round_down(UINT_MAX, page_alignment); + // Make sure max_segment (and thus segment_pages) is + // a power of two that fits in 32 bits. + const u64 max_segment = 1ul << 31; u64 segment_pages = max_segment >> PAGE_SHIFT; u64 block_size, offset, prev_end; struct i915_refct_sgt *rsgt; @@ -96,7 +98,7 @@ struct i915_refct_sgt *i915_rsgt_from_mm_node(const struct drm_mm_node *node, i915_refct_sgt_init(rsgt, node->size << PAGE_SHIFT); st = &rsgt->table; - if (sg_alloc_table(st, DIV_ROUND_UP(node->size, segment_pages), + if (sg_alloc_table(st, round_up(node->size, segment_pages), GFP_KERNEL)) { i915_refct_sgt_put(rsgt); return ERR_PTR(-ENOMEM); @@ -159,7 +161,7 @@ struct i915_refct_sgt *i915_rsgt_from_buddy_resource(struct ttm_resource *res, { struct i915_ttm_buddy_resource *bman_res = to_ttm_buddy_resource(res); const u64 size = res->num_pages << PAGE_SHIFT; - const u64 max_segment = round_down(UINT_MAX, page_alignment); + const u64 max_segment = 1u << 31; struct drm_buddy *mm = bman_res->mm; struct list_head *blocks = &bman_res->blocks; struct drm_buddy_block *block;
Re: [GIT PULL] Please pull hmm changes
On Wed, Dec 18, 2019 at 6:59 AM Jason Gunthorpe wrote: > > Do you think calling it 'mmn_subscriptions' is clear? Why do you want that "mmn"? Guys, the "mmn" part is clear from the _context_. The function name is When the function name is something like "mmu_interval_read_begin()", and the filename is "mm/mmu_notifier.c", you do NOT NEED silly prefixes like "mmn" for local variables. They add NOTHING. And they make the code an illegible mess. Yes, global function names need to be unique, and if they aren't really core, they want some prefix that explains the context, because global functions are called from _outside_ the context that explains them. But if it's a "struct mmu_interval_notifier" pointer, and it's inside a file that is all about these pointers, it shouldn't be called "mmn_xyz". That's not a name. That's line noise. So call it a "notifier". Maybe even an "interval_notifier" if you don't mind the typing. Name it by something _descriptive_. And if you want. And "subscriptions" is a lovely name. What does the "mmn" buy you? Just to clarify: the names I really hated were the local variable names (and the argument names) that were all entirely within the context of mm/mmu_notifier.c. Calling something "mmn_mm" is a random jumble of letters that looks more like you're humming than you're speaking. Don't mumble. Speak _clearly_. The other side of "short names" is that some non-local conventions exist because they are _so_ global. So if it's just a mm pointer, call it "mm". We do have some very core concepts in the kernel that permeate _everything_, and those core things we tend to have very short names for. So whenever you're working with VM code, you'll see lots of small names like "mm", "vma", "pte" etc. They aren't exactly clear, but they are _globally_ something you read and learn when you work on the Linux VM code. That's very diofferent from "mmn" - the "mmn" thing isn't some global shorthand, it is just a local abomination. So "notifier_mm" makes sense - it's the mm for a notifier. But "mmn_notifier" does not, because "mmn" only makes sense in a local context, and in that local context it's not any new information at all. See the difference? Two shorthands, but one makes sense and adds information, while the other is just unnecessary and pointless and doesn't add anything at all. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [GIT PULL] Please pull hmm changes
On Wed, Dec 18, 2019 at 10:37 AM Jason Gunthorpe wrote: > > I think this is what you are looking for? I think that with these names, I would have had an easier time reading the original patch that made me go "Eww", yes. Of course, now that it's just a rename patch, it's not like the patch is all that legible, but yeah, I think the naming is saner. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v18 00/19] kunit: introduce KUnit, the Linux kernel unit testing framework
On Fri, Oct 4, 2019 at 2:39 PM Theodore Y. Ts'o wrote: > > This question is primarily directed at Shuah and Linus > > What's the current status of the kunit series now that Brendan has > moved it out of the top-level kunit directory as Linus has requested? We seemed to decide to just wait for 5.5, but there is nothing that looks to block that. And I encouraged Shuah to find more kunit cases for when it _does_ get merged. So if the kunit branch is stable, and people want to start using it for their unit tests, then I think that would be a good idea, and then during the 5.5 merge window we'll not just get the infrastructure, we'll get a few more users too and not just examples. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v18 00/19] kunit: introduce KUnit, the Linux kernel unit testing framework
On Sun, Oct 6, 2019 at 9:55 AM Theodore Y. Ts'o wrote: > > Well, one thing we *can* do is if (a) if we can create a kselftest > branch which we know is stable and won't change, and (b) we can get > assurances that Linus *will* accept that branch during the next merge > window, those subsystems which want to use kself test can simply pull > it into their tree. Yes. At the same time, I don't think it needs to be even that fancy. Even if it's not a stable branch that gets shared between different developers, it would be good to just have people do a "let's try this" throw-away branch to use the kunit functionality and verify that "yeah, this is fairly convenient for ext4". It doesn't have to be merged in that form, but just confirmation that the infrastructure is helpful before it gets merged would be good. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [v4] vgacon: Fix a UAF in vgacon_invert_region
On Fri, Mar 6, 2020 at 4:38 AM Daniel Vetter wrote: > > Linus, since this missed the -fixes pull from Dave maybe double check I'm > not grossly wrong here and apply directly? Hmm. I don't have the original email, mind just sending it to me (with the proper added sign-off chain)? It does strike me that there's nothing that seems to check for overflow in the "(width << 1) * height" calculation. Hmm? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [v4] vgacon: Fix a UAF in vgacon_invert_region
On Fri, Mar 6, 2020 at 7:12 AM Daniel Vetter wrote: > > I'll stuff it into a pull and throw that your way, that's simplest. Thanks. > btw we did add dri-devel to lore a while back, so should be there: Indeed. I tried (incompetently) to look up your message ID, but I didn't put the dri-devel part and saw the 404, and assumed it wasn't there. My bad. > > It does strike me that there's nothing that seems to check for > > overflow in the "(width << 1) * height" calculation. Hmm? > > Indeed I failed to hunt for that :-/ But I think we're good, in > vc_do_resize() we have > > if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW) > return -EINVAL; Perfect. I just looked at the quoted patch itself. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Bug 206175] Fedora >= 5.4 kernels instantly freeze on boot without producing any display output
Dave, Alex, there's an odd bugreport on bugzilla, where Artem is seeing an odd early-boot failure. That one almost certainly has nothing to do with you guys, but see the later odd (and apparently unrelated) report about some AMD graphics firmware issue and a black screen. Linus On Tue, Jan 14, 2020 at 1:17 PM wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=206175 > > --- Comment #9 from Matt Yates (m...@fast-mail.org) --- > My BIOS vendor is "Insyde Corp.". There is a TPM module. When I disabled it, > it caused my EFI boot entry to disappear, so I couldn't test it. > > However, I think we may have two separate problems. I switched back from > Fedora to Debian Testing, and the Debian installer upgraded the kernel from > 5.3 > to 5.4 series prior to the first boot. The 5.4 kernel booted up on first > boot. > I could see boot messages scrolling, but the screen went to a black while > trying to load lightdm because I did not have the "firmware-amd-graphics" > package installed required for graphics. After installing the amd graphics > package, the 5.4 kernel freezes as before (right at the start of the boot > process). The 5.3 kernel boots as normal, and graphics work. > > The "firmware-amd-graphics" package (version 20190717-2) was the only thing I > changed, so I guess the problem must be some sort of conflict with the amd > graphics firmware and the 5.4 kernel. > > -- > You are receiving this mail because: > You are on the CC list for the bug. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 2/7] uaccess: Tell user_access_begin() if it's for a write or not
On Thu, Jan 23, 2020 at 4:59 AM Christophe Leroy wrote: > > On 32 bits powerPC (book3s/32), only write accesses to user are > protected and there is no point spending time on unlocking for reads. Honestly, I'm starting to think that 32-bit ppc just needs to look more like everybody else, than make these changes. We used to have a read/write argument to the old "verify_area()" and "access_ok()" model, and it was a mistake. It was due to odd i386 user access issues. We got rid of it. I'm not convinced this is any better - it looks very similar and for odd ppc access issues. But if we really do want to do this, then: > Add an argument to user_access_begin() to tell when it's for write and > return an opaque key that will be used by user_access_end() to know > what was done by user_access_begin(). You should make it more opaque than "unsigned long". Also, it shouldn't be a "is this a write". What if it's a read _and_ a write? Only a write? Only a read? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v3 2/7] uaccess: Tell user_access_begin() if it's for a write or not
On Thu, Jan 23, 2020 at 11:47 AM christophe leroy wrote: > > I'm going to leave it aside, at least for the time being, and do it as a > second step later after evaluating the real performance impact. I'll > respin tomorrow in that way. Ok, good. >From a "narrow the access window type" standpoint it does seem to be a good idea to specify what kind of user accesses will be done, so I don't hate the idea, it's more that I'm not convinced it matters enough. On x86, we have made the rule that user_access_begin/end() can contain _very_ few operations, and objtool really does enforce that. With objtool and KASAN, you really end up with very small ranges of user_access_begin/end(). And since we actually verify it statically on x86-64, I would say that the added benefit of narrowing by access type is fairly small. We're not going to have complicated code in that user access region, at least in generic code. > > Also, it shouldn't be a "is this a write". What if it's a read _and_ a > > write? Only a write? Only a read? > > Indeed that was more: does it includes a write. It's either RO or RW I would expect that most actual users would be RO or WO, so it's a bit odd to have those choices. Of course, often writing ends up requiring read permissions anyway if the architecture has problems with alignment handling or similar, but still... The real RW case does exist conceptually (we have "copy_in_user()", after all), but still feels like it shouldn't be seen as the only _interface_ choice. IOW, an architecture may decide to turn WO into RW because of architecture limitations (or, like x86 and arm, ignore the whole RO/RW/WO _entirely_ because there's just a single "allow user space accesses" flag), but on an interface layer if we add this flag, I really think it should be an explicit "read or write or both". So thus my "let's try to avoid doing it in the first place, but if we _do_ do this, then do it right" plea. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.6-rc1
On Wed, Jan 29, 2020 at 9:58 PM Dave Airlie wrote: > > It has two known conflicts, one in i915_gem_gtt, where you should juat > take what's in the pull (it looks messier than it is), That doesn't seem right. If I do that, I lose the added GEM_BUG_ON()'s. I think the proper merge resolution does this: diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index f10b2c41571c..f4fec7eb4064 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -131,6 +131,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt)); do { + GEM_BUG_ON(iter.sg->length < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma); iter.dma += I915_GTT_PAGE_SIZE; diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index 077b8f7cf6cb..4d1de2d97d5c 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -379,6 +379,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { + GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma; iter->dma += I915_GTT_PAGE_SIZE; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 79096722ce16..531d501be01f 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -787,7 +787,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size) * readback check when writing GTT PTE entries. */ if (IS_GEN9_LP(i915) || INTEL_GEN(i915) >= 10) - ggtt->gsm = ioremap_nocache(phys_addr, size); + ggtt->gsm = ioremap(phys_addr, size); else ggtt->gsm = ioremap_wc(phys_addr, size); if (!ggtt->gsm) { since those ppgtt_insert_entries functions had moved to their gen-specific files. No? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.6-rc1
On Thu, Jan 30, 2020 at 8:13 AM Linus Torvalds wrote: > > That doesn't seem right. If I do that, I lose the added GEM_BUG_ON()'s. Just for your ref: see commit ecc4d2a52df6 ("drm/i915/userptr: fix size calculation") for the source of those debug statements, and then 2c86e55d2ab5 ("drm/i915/gtt: split up i915_gem_gtt") on the other side that just moved the functions.. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm fixes for 5.3-rc9
On Thu, Sep 12, 2019 at 8:56 AM Dave Airlie wrote: > > Hey Linus, > > From the maintainer summit, just some last minute fixes for final, > details in the tag. So because my mailbox was more unruly than normal (because of same maintainer summit travel), I almost missed this email entirely. Why? Because you don't have the normal "git pull" anywhere in the email, so it doesn't trigger my search for important emails. There's a "git" in the email body, but there's not a "pull" anywhere. Could you add either a "please pull" or something to the email body - or to make things _really_ obvious, add the "[GIT PULL]" prefix to the subject line? Or anything, really, to whatever script or workflow you use to generate these? Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm fixes for 5.3-rc9
On Sun, Sep 15, 2019 at 8:12 AM Dave Airlie wrote: > > I've been manually writing the subject lines, seems I need to fix my brain. Note that my "find git pull requests" logic doesn't need it in the subject line at all, so if you just change whatever script you use to generate the email body to have an additional "pull" in there somewhere, that's perfectly workable too. > The reason I do that is I generate on one machine the body, and send > it via the gmail webui on whatever machine I'm using. This helps > avoids google tagging my emails as spam for generating them using > someone elses smtp servers. I should probably setup properly sending > gmail to avoid that. Don't worry too much about it too much. I _do_ try to always read all my email, it's just that I find things faster that match that pattern. And I don't think I've actually lost one of your pull requests (knock wood), they might just end up delayed a bit. That said - having "[GIT PULL]" in the subject line is how the pr-tracker-bot finds the emails too, so if you do the subject line thing, you'll not only trigger my search term, you'll also get the nice notifications from the bot when I've pushed out my merge. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm tree for 5.4-rc1
On Thu, Sep 19, 2019 at 12:09 PM Dave Airlie wrote: > > There are a few merge conflicts across the board, we have a shared > rerere cache which meant I hadn't noticed them until I avoided the > cache. > https://cgit.freedesktop.org/drm/drm/log/?h=drm-5.4-merge > contains what we've done, none of them are too crazy. Hmm. My merge isn't identical to that. It's close though. Different order for one #define which might be just from you and me merging different directions. But I also ended up removing the .gem_prime_export initialization to drm_gem_prime_export, because it's the default if none exists. That's the left-over from 3baeeb21983a ("drm/mtk: Drop drm_gem_prime_export/import") after the import stayed around because it got turned into an actually non-default one. I think that both of our merges are right - equivalent but just slightly different. But the reason I'm pointing this out is that I also get the feeling that if it needs that dev->dev_private difference from the default function in prime_import(), wouldn't it need the same for prime_export too? I don't know the code, and I don't know the hardware, but just from a "pattern matching" angle I just wanted to check whether maybe there's need for a mtk_drm_gem_prime_export() wrapper that does that same thing with struct mtk_drm_private *private = dev->dev_private; .. use private->dev instead of dev->dev .. So I'm just asking that somebody that knows that drm/mtk code should double-check that oddity. Thanks, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Ack to merge through DRM? WAS Re: [PATCH v2 1/5] mm: Add write-protect and clean utilities for address space ranges
On Thu, Sep 26, 2019 at 5:03 AM Thomas Hellström (VMware) wrote: > > I wonder if I can get an ack from an mm maintainer to merge this through > DRM along with the vmwgfx patches? Andrew? Matthew? It would have helped to actually point to the patch itself, instead of just quoting the commit message. Looks like this: https://lore.kernel.org/lkml/20190926115548.44000-2-thomas...@shipmail.org/ but why is the code in question not just using the regular page walkers. The commit log shows no explanation of what's so special about this? Is the only reason the locking magic? Because if that's the reason, then afaik we already have a function for that: it's __walk_page_range(). Yes, it's static right now, but that's imho not a reason to duplicate all the walking (badly). Is there some other magic reason that isn't documented? Linus
Re: Ack to merge through DRM? WAS Re: [PATCH v2 1/5] mm: Add write-protect and clean utilities for address space ranges
On Thu, Sep 26, 2019 at 1:09 PM Thomas Hellström (VMware) wrote: > > That said, if people are OK with me modifying the assert in > pud_trans_huge_lock() and make __walk_page_range non-static, it should > probably be possible to make it work, yes. I don't think you need to modify that assert at all. That thing only exists when there's a "pud_entry" op in the walker, and then you absolutely need to have that mmap_lock. As far as I can tell, you fundamentally only ever work on a pte level in your address space walker already and actually have a WARN_ON() on the pud_huge thing, so no pud entry can possibly apply. So no, the assert in pud_trans_huge_lock() does not seem to be a reason not to just use the existing page table walkers. And once you get rid of the walking, what is left? Just the "iterate over the inode mappings" part. Which could just be done in mm/pagewalk.c, and then you don't even need to remove the static. So making it be just another walking in pagewalk.c would seem to be the simplest model. Call it "walk_page_mapping()". And talk extensively about how the locking differs a lot from the usual "walk_page_vma()" things. The then actual "apply" functions (what a horrid name) could be in the users. They shouldn't be mixed in with the walking functions anyway. They are callbacks, not walkers. Linus
Re: Ack to merge through DRM? WAS Re: [PATCH v2 1/5] mm: Add write-protect and clean utilities for address space ranges
On Thu, Sep 26, 2019 at 1:55 PM Thomas Hellström (VMware) wrote: > > Well, we're working on supporting huge puds and pmds in the graphics > VMAs, although in the write-notify cases we're looking at here, we would > probably want to split them down to PTE level. Well, that's what the existing walker code does if you don't have that "pud_entry()" callback. That said, I assume you would *not* want to do that if the huge pud/pmd is already clean and read-only, but just continue. So you may want to have a special pud_entry() that handles that case. Eventually. Maybe. Although honestly, if you're doing dirty tracking, I doubt it makes much sense to use largepages. > Looking at zap_pud_range() which when called from unmap_mapping_pages() > uses identical locking (no mmap_sem), it seems we should be able to get > away with i_mmap_lock(), making sure the whole page table doesn't > disappear under us. So it's not clear to me why the mmap_sem is strictly > needed here. Better to sort those restrictions out now rather than when > huge entries start appearing. zap_pud_range()actually does have that VM_BUG_ON_VMA(!rwsem_is_locked(&tlb->mm->mmap_sem), vma); exactly for the case where it might have to split the pud entry. Zapping the whole thing it does do without the assert. I'm not going to swear the mmap_sem is absolutely required, since a shared vma should be stable due to the i_mmap_lock, but splitting the hugepage really is a fairly big deal. It can't happen if you zap the *whole* mapping, but it can happen if you have a start/end range. Like you do. Also, in general it's probably not a great idea to look at zap_page_range() (and copy_page_range()) for ideas. They are kind of special, since they tend to be used for fundamental whole-address-space operations (ie fork/exit) and so as a result they get to do special things that a normal page walker generally shouldn't do. It's why they've never gotten translated to use the generic walker code. Linus
Re: Ack to merge through DRM? WAS Re: [PATCH v2 1/5] mm: Add write-protect and clean utilities for address space ranges
On Fri, Sep 27, 2019 at 5:17 AM Kirill A. Shutemov wrote: > > > Call it "walk_page_mapping()". And talk extensively about how the > > locking differs a lot from the usual "walk_page_vma()" things. > > Walking mappings of a page is what rmap does. This code thas to be > integrated there. Well, that's very questionable. The rmap code mainly does the "page -> virtual" mapping. One page at a time. The page walker code does the "virtual -> pte" mapping. Always a whole range at a time. The new code wants a combination of both. It very much is about walking ranges - as in mm/pagewalk.c. It's just that it walks potentially multiple ranges, based on where the address space is mapped. I think it has way more commonalities with the page walking code than it has with the rmap code. But yes, there is some of that "look up mappings based on address space" in there too, but it's the least part of it And as Thomas pointed out, it also has commonalities with unmap_mapping_pages() in mm/memory.c. In many ways that part is the closest. I'd say that from a code sharing standpoint, mm/rmap.c is absolutely the wrong place. It's the furthest away from what Thomas wants to do. The mm/pagewalk.c code has the most actual code that could be shared, and the addition would be smallest there. And conceptually the closest analogue in terms of what it _does_ is unmap_mapping_range() in mm/memory.c, but I see no room for sharing actual code there unless we completely change how we do zap_page_range() and add a lot of configurability there (which we don't want, because page table teardown at exit is really a pretty critical operation - I commonly see copy_page_range() and zap_page_range() on profiles if you have things like script-heavyu traditional UNIX loads). So I think conceptually, mm/memory.c and unmap_mapping_range() is closest but I don't think it's practical to share code. And between mm/pagewalk.c and mm/rmap.c, I think the page walking has way more of actual practical code sharing, and is also conceptually closer because most of the code is about walking a range, not looking up the mapping of one page. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Ack to merge through DRM? WAS Re: [PATCH v2 1/5] mm: Add write-protect and clean utilities for address space ranges
On Mon, Sep 30, 2019 at 6:04 AM Kirill A. Shutemov wrote: > > Have you seen page_vma_mapped_walk()? I made it specifically for rmap code > to cover cases when a THP is mapped with PTEs. To me it's not a big > stretch to make it cover multiple pages too. I agree that is closer, but it does make for calling that big complex function for every iteration step. Of course, you are right that the callback approach is problematic too, now that we have retpoline issues, making those very expensive. But at least that hopefully gets fixed some day and gets to be a rare problem. Matter ot taste, I guess. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] drm for 5.5-rc1
On Wed, Nov 27, 2019 at 4:59 PM Dave Airlie wrote: > > my sample merge is here: > https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next-5.5-merged Hmm. I think you missed a couple: you left a duplicate intel_update_rawclk() around (it got moved into intel_power_domains_init_hw() by commit 2f216a850715 ("drm/i915: update rawclk also on resume"), and you left the "select REFCOUNT_FULL" that no longer exists. And apparently nobody bothered to tell me about the semantic conflict with the media tree due to the changed calling convention of cec_notifier_cec_adap_unregister(). Didn't that show up in linux-next? Anyway, merged and pushed out, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [git pull] mm + drm vmwgfx coherent
On Thu, Nov 28, 2019 at 5:15 PM Dave Airlie wrote: > > This is just a separated pull for the mm pagewalking + drm/vmwgfx work > Thomas did and you were involved in, I've left it separate in case you > don't feel as comfortable with it as the other stuff. Thanks, pulled (and the delay wasn't because of me being nervous about the code, it was just because of turkey and a day of rest afterwards). And I appreciate the separation - not because I wasn't comfortable with the final code, but simply because it's a rather different thing than the usual drm code. Having that as a separate pull and not mixed up with the regular driver updates is just how I prefer it. Thanks, Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [GIT PULL] Please pull hmm changes
On Mon, Nov 25, 2019 at 12:42 PM Jason Gunthorpe wrote: > > You will probably be most interested in the patch "mm/mmu_notifier: add an > interval tree notifier". I'm trying to read that patch, and I'm completely unable to by the absolutely *horrid* variable names. There are zero excuses for names like "mmn_mm". WTF? I'll try to figure the code out, but my initial reaction was "yeah, not in my VM". Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [GIT PULL] Please pull hmm changes
On Sat, Nov 30, 2019 at 10:03 AM Linus Torvalds wrote: > > I'll try to figure the code out, but my initial reaction was "yeah, > not in my VM". Why is it ok to sometimes do WRITE_ONCE(mni->invalidate_seq, cur_seq); (to pair with the unlocked READ_ONCE), and sometimes then do mni->invalidate_seq = mmn_mm->invalidate_seq; My initial guess was that latter is only done at initialization time, but at least in one case it's done *after* the mni has been added to the mmn_mm (oh, how I despise those names - I can only repeat: WTF?). See __mmu_interval_notifier_insert() in the mmn_mm->active_invalidate_ranges case. I'm guessing that it doesn't matter, because when inserting the notifier the sequence number is presumably not used until after the insertion (and any use though mmn_mm is protected by the mmn_mm->lock), but it still looks odd to me. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [GIT PULL] Please pull hmm changes
On Mon, Nov 25, 2019 at 12:42 PM Jason Gunthorpe wrote: > > Here is this batch of hmm updates, I think we are nearing the end of this > project for now, although I suspect there will be some more patches related to > hmm_range_fault() in the next cycle. I've ended up pulling this, but I'm not entirely happy with the code. You've already seen the comments on it in the earlier replies. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm pull for v5.3-rc1
On Tue, Aug 6, 2019 at 12:38 AM Christoph Hellwig wrote: > > Seems like no one took this up. Below is a version which I think is > slightly better by also moving the mm_walk structure initialization > into the helpers, with an outcome of just a handful of added lines. Ack. Agreed, I think that's a nicer interface. In fact, I do note that a lot of the users don't actually use the "void *private" argument at all - they just want the walker - and just pass in a NULL private pointer. So we have things like this: > + if (walk_page_range(&init_mm, va, va + size, &set_nocache_walk_ops, > + NULL)) { and in a perfect world we'd have arguments with default values so that we could skip those entirely for when people just don't need it. I'm not a huge fan of C++ because of a lot of the complexity (and some really bad decisions), but many of the _syntactic_ things in C++ would be nice to use. This one doesn't seem to be one that the gcc people have picked up as an extension ;( Yes, yes, we could do it with a macro, I guess. #define walk_page_range(mm, start,end, ops, ...) \ __walk_page_range(mm, start, end, (NULL , ## __VA_ARGS__)) but I'm not sure it's worthwhile. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: drm pull for v5.3-rc1
On Tue, Aug 6, 2019 at 11:40 PM Christoph Hellwig wrote: > > I'm not an all that huge fan of super magic macro loops. But in this > case I don't see how it could even work, as we get special callbacks > for huge pages and holes, and people are trying to add a few more ops > as well. Yeah, in this case we definitely don't want to make some magic loop walker. Loops are certainly simpler than callbacks for most cases (and often faster because you don't have indirect calls which now are getting quite expensive), but the walker code really does end up having tons of different cases that you'd have to handle with magic complex conditionals or switch statements instead. So the "walk over range using this set of callbacks" is generally the right interface. If there is some particular case that might be very simple and the callback model is expensive due to indirect calls for each page, then such a case should probably use the normal page walking loops (that we *used* to have everywhere - the "walk_range()" interface is the "new" model for all the random odd special cases). Linus
Re: [git pull] drm fixes for 5.13-rc6
On Thu, Jun 10, 2021 at 8:41 PM Dave Airlie wrote: > > git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2021-06-11 I think anongit.freedesktop.org is sick. Can you ask somebody to give it some tender loving? It's just disconnecting immediately.. Linus
Re: [git pull] drm fixes for 5.13-rc6
On Fri, Jun 11, 2021 at 10:07 AM Linus Torvalds wrote: > > I think anongit.freedesktop.org is sick. Can you ask somebody to give > it some tender loving? It's just disconnecting immediately.. Either somebody gave the site a hug, or it decided to just get better on its own. It's working now, Linus
Re: drm next for 6.3-rc1
On Fri, Feb 24, 2023 at 5:30 PM Dave Airlie wrote: > > Any issues with this? I get nervous around 48hrs :-) It was merged on Wednesday evening. See commit a5c95ca18a98. If you were waiting for a pr-tracker-bot reply, I think you need to put "{GIT PULL]" in the subject line for the automation to trigger on it. Linus
Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)
[ Back from travel, so trying to make sense of this series.. ] On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park wrote: > > I've been developing a tool for detecting deadlock possibilities by > tracking wait/event rather than lock(?) acquisition order to try to > cover all synchonization machanisms. It's done on v6.2-rc2. Ugh. I hate how this adds random patterns like if (timeout == MAX_SCHEDULE_TIMEOUT) sdt_might_sleep_strong(NULL); else sdt_might_sleep_strong_timeout(NULL); ... sdt_might_sleep_finish(); to various places, it seems so very odd and unmaintainable. I also recall this giving a fair amount of false positives, are they all fixed? Anyway, I'd really like the lockdep people to comment and be involved. We did have a fairly recent case of "lockdep doesn't track page lock dependencies because it fundamentally cannot" issue, so DEPT might fix those kinds of missing dependency analysis. See https://lore.kernel.org/lkml/60d41f05f139a...@google.com/ for some context to that one, but at teh same time I would *really* want the lockdep people more involved and acking this work. Maybe I missed the email where you reported on things DEPT has found (and on the lack of false positives)? Linus
Re: [git pull] drm for 6.1-rc1
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie wrote: > > This is very conflict heavy, mostly the correct answer is picking > the version from drm-next. Ugh, yes, that was a bit annoying. I get the same end result as you did, but I do wonder if the drm people should try to keep some kind of separate "fixes" branches for things that go both into the development tree and then get sent to me for fixes pulls? Hopefully this "lots of pointless noise" was a one-off, but it might be due to how you guys work.. Linus
Re: [git pull] drm for 6.1-rc1
On Tue, Oct 4, 2022 at 8:42 PM Dave Airlie wrote: > > Lots of stuff all over, some new AMD IP support and gang > submit support [..] Hmm. I have now had my main desktop lock up twice after pulling this. Nothing in the dmesg after a reboot, and nothing in particular that seems to trigger it, so I have a hard time even guessing what's up, but the drm changes are the primary suspect. I will try to see if I can get any information out of the machine, but with the symptom being just a dead machine ... This is the same (old) Radeon device: 49:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) with dual 4k monitors, running on my good old Threadripper setup. Again, there's no explicit reason to blame the drm pull, except that it started after that merge (that machine ran the kernel with the networking pull for a day with no problems, and while there were other pull requests in between them, they seem to be fairly unrelated to the hardware I have). But the lockup is so sporadic (twice in the last day) that I really can't bisect it, so I'm afraid I have very very little info. Any suggestions? Linus
Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")
On Thu, Oct 6, 2022 at 1:51 AM Sudip Mukherjee (Codethink) wrote: > > This is only seen with gcc-11, gcc-12 builds are ok. Hmm. This seems to be some odd gcc issue. I *think* that what is going on is that the test j = 0 ; j < MAX_DWB_PIPES makes gcc decide that "hey, j is in the range [0,MAX_DWB_PIPES[", and then since MAX_DWB_PIPES is 1, that simplifies to "j must be zero". Good range analysis so far. But for 'i', we start off with that lower bound of 0, but the upper bound is not fixed (the test for "i" is: "i < stream->num_wb_info"). And then that "if (i != j)", so now gcc decides that it can simplify that to "if (i != 0)", and then simplifies *that* to "oh, the lower bound of 'i' in that code is actually 1. So then it decides that "stream->writeback_info[i]" must be out of bounds. Of course, the *reality* is that stream->num_wb_info should be <= MAX_DWB_PIPES, and as such (with the current MAX_DWB_PIPES value of 1) it's not that 'i' can be 1, it's that the code in question cannot be reached at all. What confuses me is that error message ("array subscript [0, 0] is outside array bounds of 'struct dc_writeback_info[1]') which seems to be aware that the value is actually 0. So this seems to be some gcc-11 range analysis bug, but I don't know what the fix is. I suspect some random code change would magically just make gcc realize it's ok after all, but since it all depends on random gcc confusion, I don't know what the random code change would be. The fix *MAY* be to just add a '&& i < MAX_DWB_PIPES' to that loop too, and then gcc will see that both i and j are always 0, and that the code is unreachable and not warn about it. Hmm? Can you test that? And the reason gcc-12 builds are ok probably isn't that gcc-12 gets this right, it's simply that gcc-12 gets so many *opther* things wrong that we already disabled -Warray-bounds with gcc-12 entirely. If somebody cannot come up with a fix, I suspect the solution is "gcc array bounds analysis is terminally buggy" and we just need to disable it for gcc-11 too. Kees, any idea? Who else might be interested in fixing a -Warray-bounds issue? Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 12:30 PM Dave Airlie wrote: > > netconsole? I've never been really successful with that in the past, and haven't used it for decades. I guess I could try if nothing else works. Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 12:28 PM Alex Deucher wrote: > > Maybe you are seeing this which is an issue with GPU TLB flushes which > is kind of sporadic: > https://gitlab.freedesktop.org/drm/amd/-/issues/2113 Well, that seems to be 5.19, and while timing changes (or whatever other software updates) could have made it start trigger, this machine has been pretty solid otgerwise. > Are you seeing any GPU page faults in your kernel log? Nothing even remotely like that "no-retry page fault" in that issue report. Of course, if it happens just before the whole thing locks up... Linus
Re: [git pull] drm for 6.1-rc1
On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie wrote: > > > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0088 > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched] As far as I can tell, that's the line struct drm_gpu_scheduler *sched = s_fence->sched; where 's_fence' is NULL. The code is 0: 0f 1f 44 00 00nopl 0x0(%rax,%rax,1) 5: 41 54push %r12 7: 55push %rbp 8: 53push %rbx 9: 48 89 fb mov%rdi,%rbx c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction 13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp) 1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax and that next 'lock decl' instruction would have been the atomic_dec(&sched->hw_rq_count); at the top of drm_sched_job_done(). Now, as to *why* you'd have a NULL s_fence, it would seem that drm_sched_job_cleanup() was called with an active job. Looking at that code, it does if (kref_read(&job->s_fence->finished.refcount)) { /* drm_sched_job_arm() has been called */ dma_fence_put(&job->s_fence->finished); ... but then it does job->s_fence = NULL; anyway, despite the job still being active. The logic of that kind of "fake refcount" escapes me. The above looks fundamentally racy, not to say pointless and wrong (a refcount is a _count_, not a flag, so there could be multiple references to it, what says that you can just decrement one of them and say "I'm done"). Now, _why_ any of that happens, I have no idea. I'm just looking at the immediate "that pointer is NULL" thing, and reacting to what looks like a completely bogus refcount pattern. But that odd refcount pattern isn't new, so it's presumably some user on the amd gpu side that changed. The problem hasn't happened again for me, but that's not saying a lot, since it was very random to begin with. Linus
Re: mainline build failure due to 5d8c3e836fc2 ("drm/amd/display: fix array-bounds error in dc_stream_remove_writeback()")
On Thu, Oct 6, 2022 at 1:50 PM Sudip Mukherjee wrote: > > > And it looks like Sudip's proposed fix for this particular code is > > additionally fixing unsigned vs signed as well. I think -Warray-bounds > > did its job (though, with quite a confusing index range in the report). > > Not my. Linus's. I just tested. :) I suspect Kees meant Stephen's other patch that Hamza pointed at, and that is perhaps the cleaner version. That said, I hate how this forces us to write random code changes just to make a compiler just randomly _happen_ to not complain about it. Linus
Re: [git pull] drm fixes for 6.1-rc1
On Thu, Oct 13, 2022 at 5:29 PM Dave Airlie wrote: > > Round of fixes for the merge window stuff, bunch of amdgpu and i915 > changes, this should have the gcc11 warning fix, amongst other > changes. Some of those amd changes aren't "fixes". They are some major code changes. We're still in the merge window, so I'm letting it slide, but calling then "fixes" really stretches things. They are fixes exactly the same way completely new development can "fix" things. Linus
Re: [PATCH] drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n
On Fri, Oct 14, 2022 at 8:22 AM Nathan Chancellor wrote: > > After commit 8799c0be89eb ("drm/amd/display: Fix vblank refcount in vrr > transition"), a build with CONFIG_DEBUG_FS=n is broken due to a > misplaced brace, along the lines of: Thanks, applied. Linus
Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE
On Wed, Nov 16, 2022 at 2:30 AM David Hildenbrand wrote: > > Let's make it clearer that functionality provided by FOLL_FORCE is > really only for ptrace access. I'm not super-happy about this one. I do understand the "let's rename the bit so that no new user shows up". And it's true that the main traditional use is ptrace. But from the patch itself it becomes obvious that no, it's not *just* ptrace. At least not yet. It's used for get_arg_page(), which uses it to basically look up (and install) pages in the newly created VM. Now, I'm not entirely sure why it even uses FOLL_FORCE, - I think it might be historical, because the target should always be the new stack vma. Following the history of it is a big of a mess, because there's a number of renamings and re-organizations, but it seems to go back to 2007 and commit b6a2fea39318 ("mm: variable length argument support"). Before that commit, we kept our own array of "this is the set of pages that I will install in the new VM". That commit basically just inserts the pages directly into the VM instead, getting rid of the array size limitation. So at a minimum, I think that FOLL_FORCE would need to be removed before any renaming to FOLL_PTRACE, because that's not some kind of small random case. It *might* be as simple as just removing it, but maybe there's some reason for having it that I don't immediately see. There _are_ also small random cases too, like get_cmdline(). Maybe that counts as ptrace, but the execve() case most definitely does not. Linus
Re: [PATCH mm-unstable v1 20/20] mm: rename FOLL_FORCE to FOLL_PTRACE
On Thu, Nov 17, 2022 at 2:58 PM Kees Cook wrote: > > Oh, er, why does get_arg_page() even need FOLL_FORCE? This is writing the > new stack contents to the nascent brpm->vma, which was newly allocated > with VM_STACK_FLAGS, which an arch can override, but they all appear to > include > VM_WRITE | VM_MAYWRITE. Yeah, it does seem entirely superfluous. It's been there since the very beginning (although in that original commit b6a2fea39318 it was there as a '1' to the 'force' argument to get_user_pages()). I *think* it can be just removed. But as long as it exists, it should most definitely not be renamed to FOLL_PTRACE. There's a slight worry that it currently hides some other setup issue that makes it matter, since it's been that way so long, but I can't see what it is. Linus
Re: [PATCH RFC 16/19] mm/frame-vector: remove FOLL_FORCE usage
On Tue, Nov 22, 2022 at 4:25 AM Hans Verkuil wrote: > > I tracked the use of 'force' all the way back to the first git commit > (2.6.12-rc1) in the very old video-buf.c. So it is very, very old and the > reason is lost in the mists of time. Well, not entirely. For archaeology reasons, I went back to the old BK history, which exists as a git conversion in https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/ and there you can actually find it. Not with a lot of explanations, though - it's commit b7649ef789 ("[PATCH] videobuf update"): This updates the video-buf.c module (helper module for video buffer management). Some memory management fixes, also some adaptions to the final v4l2 api. but it went from err = get_user_pages(current,current->mm, -data, dma->nr_pages, -rw == READ, 0, /* don't force */ +data & PAGE_MASK, dma->nr_pages, +rw == READ, 1, /* force */ dma->pages, NULL); in that commit. So it goes back to October 2002. > Looking at this old LWN article https://lwn.net/Articles/28548/ suggests > that it might be related to calling get_user_pages for write buffers The timing roughly matches. > I assume that removing FOLL_FORCE from 'FOLL_FORCE|FOLL_WRITE' will still > allow drivers to read from the buffer? The issue with some of the driver hacks has been that - they only want to read, and the buffer may be read-only - they then used FOLL_WRITE despite that, because they want to break COW (due to the issue that David is now fixing with his series) - but that means that the VM layer says "nope, you can't write to this read-only user mapping" - ... and then they use FOLL_FORCE to say "yes, I can". iOW, the FOLL_FORCE may be entirely due to an (incorrect, but historically needed) FOLL_WRITE. Linus
Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers
On Thu, Nov 3, 2022 at 10:48 PM Steven Rostedt wrote: > > Ideally, I would have the first patch go into this rc cycle, which is mostly > non functional as it will allow the other patches to come in via the > respective > subsystems in the next merge window. Ack. I also wonder if we could do the completely trivially correct conversions immediately. I'm talking about the scripted ones where it's currently a "del_timer_sync()", and the very next action is freeing whatever data structure the timer is in (possibly with something like free_irq() in between - my point is that there's an unconditional free that is very clear and unambiguous), so that there is absolutely no question about whether they should use "timer_shutdown_sync()" or not. IOW, things like patches 03, 17 and 31, and at least parts others in this series. This series clearly has several much more complex cases that need actual real code review, and I think it would help to have the completely unambiguous cases out of the way, just to get rid of noise. So I'd take that first patch, and a scripted set of "this cannot change any semantics" patches early. Linus
Re: [RFC][PATCH v3 00/33] timers: Use timer_shutdown*() before freeing timers
On Fri, Nov 4, 2022 at 12:42 PM Steven Rostedt wrote: > > Linus, should I also add any patches that has already been acked by the > respective maintainer? No, I'd prefer to keep only the ones that are 100% unambiguously not changing any semantics. Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Fri, Nov 4, 2022 at 11:01 PM Steven Rostedt wrote: > > Patch 1 fixes an issue with sunrpc/xprt where it incorrectly uses > del_singleshot_timer_sync() for something that is not a oneshot timer. As this > will be converted to shutdown, this needs to be fixed first. So this is the kind of thing that I would *not* want to get eartly. I really would want to get just the infrastructure in to let people start doing conversions. And then the "mindlessly obvious patches that are done by scripting and can not possibly matter". The kinds that do not *need* review, because they are mechanical, and that just cause pointless noise for the rest of the patches that *do* want review. Not this kind of thing that is so subtle that you have to explain it. That's not a "scripted patch for no semantic change". So leave the del_singleshot_timer_sync() cases alone, they are irrelevant for the new infrastructure and for the "mindless scripted conversion" patches. > Patches 2-4 changes existing timer_shutdown() functions used locally in ARM > and > some drivers to better namespace names. Ok, these are relevant. > Patch 5 implements the new timer_shutdown() and timer_shutdown_sync() > functions > that disable re-arming the timer after they are called. This is obviously what I'd want early so that people can start doign this in their trees. > Patches 6-28 change all the locations where there's a kfree(), kfree_rcu(), > kmem_cache_free() and one call_rcu() call where the RCU function frees the > timer (the workqueue patch) in the same function as the del_timer{,_sync}() is > called on that timer, and there's no extra exit path between the del_timer and > freeing of the timer. So honestly, I was literally hoping for a "this is the coccinelle script" kind of patch. Now there seems to be a number of patches here that are actualyl really hard to see that they are "obviously correct" and I can't tell if they are actually scripted or not. They don't *look* scripted, but I can't really tell. I looked at the patches with ten lines of context, and I didn't see the immediately following kfree() even in that expanded patch context, so it's fairly far away. Others in the series were *definitely* not scripted, doing clearly manual cleanups: -if (dch->timer.function) { -del_timer(&dch->timer); -dch->timer.function = NULL; -} +timer_shutdown(&dch->timer); so no, this does *not* make me feel "ok, this is all trivial". IOW, I'd really want *just* the infrastructure and *just* the provably trivial stuff. If it wasn't some scripted really obvious thing that cannot possibly change anything and that wasn't then edited manually for some reason, I really don't want it early. IOW, any early conversions I'd take are literally about removing pure mindless noise. Not about doing conversions. And I wouldn't mind it as a single conversion patch that has the coccinelle script as the explanation. Really just THAT kind of "100% mindless conversion". Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Sat, Nov 5, 2022 at 11:04 AM Steven Rostedt wrote: > > Here's the changes I made after running the script Please. No. What part of "I don't want extra crud" was I unclear on? I'm not interested in converting everything. That's clearly a 6.,2 issue, possibly even longer considering how complicated the networking side has been. I'm not AT ALL interested in "oh, I then added my own small cleanups on top to random files because I happened to notice them". Repeat after me: "If the script didn't catch them, they weren't trivially obvious". And it does seem that right now the script itself is a bit too generous, which is why it didn't notice that sometimes there wasn't a kfree after all because of a goto around it. So clearly that "..." doesn't really work, I think it accepts "_any_ path leads to the second situation" rather than "_all_ paths lead to the second situation". But yeah, my coccinelle-foo is very weak too, and maybe there's no pattern for "no flow control". I would also like the coccinelle script to notice the "timer is used afterwards", so that it does *not* modify that case that does del_timer(&dch->timer); dch->timer.function = NULL; since now the timer is modified in between the del_timer() and the kfree. Again, that timer modification is then made pointless by changing the del_timer() to a "timer_shutdown()", but at that point it is no longer a "so obvious non-semantic change that it should be scripted". At that point it's a manual thing. So I think the "..." in your script should be "no flow control, and no access to the timer", but do not know how to do that in coccinelle. Julia? And this thread has way too many participants, I suspect some email systems will just mark it as spam as a result. Which is partly *why* I would like to get rid of noisy changes that really don't matter - but I would like it to be truly mindlessly obvious that there are *zero* questions about it, and absolutely no manual intervention because the patch is so strict that it's just unquestionably correct. Linus
Re: [PATCH v4a 00/38] timers: Use timer_shutdown*() before freeing timers
On Sat, Nov 5, 2022 at 2:03 PM Jason A. Donenfeld wrote: > > Something that might help here is changing the `...` into > `... when exists` or into `... when != ptr` or similar. I actually tried that. You don't want "when exists", you'd want "when forall", but that seems to be the default. And trying "when != ptr->timer" actually does the right thing in that it gets rid of the case where the timer is modified outside of the del_timer() case, *but* it also causes odd other changes to the output. Look at what it generates for that drivers/media/usb/pvrusb2/pvrusb2-hdw.c file, which finds a lot of triggers with the "when != ptr->timer", but only does one without it. So I gave up, just because I clearly don't understand the rules. (Comparing output is also fun because the ordering of the patches is random, so consecutive runs with the same rule will give different patches. I assume that it's just because it's done in parallel, but it doesn't help the "try to see what changes when you change the script" ;) Linus
Re: [PATCH RFC 00/19] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)
On Mon, Nov 7, 2022 at 8:18 AM David Hildenbrand wrote: > > So instead, make R/O long-term pinning work as expected, by breaking COW > in a COW mapping early, such that we can remove any FOLL_FORCE usage from > drivers. Nothing makes me unhappy from a quick scan through these patches. And I'd really love to just have this long saga ended, and FOLL_FORCE finally relegated to purely ptrace accesses. So an enthusiastic Ack from me. Linus
Re: [git pull] drm for 6.2-rc1
On Mon, Dec 12, 2022 at 6:56 PM Dave Airlie wrote: > > There are a bunch of conflicts, one in amdgpu is a bit nasty, I've > cc'ed Christian/Alex to make sure they know to check whatever > resolution you find. The one I have is what we have in drm-tip tree. Hmm. My merge resolution is slightly different from yours. You seem to have basically dropped commit b09d6acba1d9 ("drm/amdgpu: handle gang submit before VMID"). Now, there are other fence changes in the drm tree that may mean that that commit *should* be dropped, so it's entirely possible that my resolution which kept that ordering change might be wrong and your resolution that just took the drm tip code is the right one. Christian? Alex? Can you please double-check what I just pushed out? Linus
Re: [git pull] drm for 6.2-rc1
On Wed, Dec 14, 2022 at 12:05 AM Christian König wrote: > > Anyway we need to re-apply b09d6acba1d9 which should be trivial. Note that my resolution did exactly that (*), it's just that when I double-checked against Dave's suggested merge that I noticed I'd done things differently than he did. (*) Well, when I say "did exactly that" I don't actually know some of the other fencing changes that happened, so there may be a reason why something further should still be done. So I can only point to my merge commit a594533df0f6 and ask people to verify. It does all at least work for me. Knock wood. Linus
Disabling -Warray-bounds for gcc-13 too
Kees, I made the mistake of upgrading my M2 Macbook Air to Fedora-38, and in the process I got gcc-13 which is not WERROR-clean because we only limited the 'array-bounds' warning to gcc-11 and gcc-12. But gcc-13 has all the same issues. And I want to be able to do my arm64 builds with WERROR on still... I guess it never made much sense to hope it was going to go away without having a confirmation, so I just changed it to be gcc-11+. A lot of the warnings seem just crazy, with gcc just not getting the bounds right, and then being upset about us going backwards with 'container_of()' etc. Ok, so the kernel is special. We do odd things. I get it, gcc ends up being confused. But before I disabled it, I did take a look at a couple of warnings that didn't look like the sea of crazy. And one of them is from you. In particular, commit 4076ea2419cf ("drm/nouveau/disp: Fix nvif_outp_acquire_dp() argument size") cannot possibly be right, It changes nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[16], to nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE], and then does memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd)); where that 'args.dp.dpcd' is a 16-byte array, and DP_RECEIVER_CAP_SIZE is 15. So yeah, it's copying 16 bytes from an argument that claims to be 15 bytes in size. I think that commit was wrong, and the problem is that the 'dpcd' array is something 15 and sometimes 16. For example, we have struct nouveau_encoder { ... union { struct { ... u8 dpcd[DP_RECEIVER_CAP_SIZE]; } dp; }; so there it's indeed 15 bytes, but then we have union nvif_outp_acquire_args { struct nvif_outp_acquire_v0 { ... union { ... struct { ... __u8 dpcd[16]; } dp; where it's 16. I think it's all entirely harmless from a code generation standpoint, because the 15-byte field will be padded out to 16 bytes in the structure that contains it, but it's most definitely buggy. So that warning does find real cases of wrong code. But when those real cases are hidden by hundreds of lines of unfixable false positives, we don't have much choice. But could the Nouveau driver *please* pick a size for the dhcp[] array and stick with it? The other driver where the warnings didn't look entirely crazy was the ath/carl9170 wireless driver, but I didn't look closer at that one. Linus
Re: mainline build failure due to 322458c2bb1a ("drm/ttm: Reduce the number of used allocation orders for TTM pages")
On Wed, Apr 26, 2023 at 10:44 AM Sudip Mukherjee (Codethink) wrote: > > drivers/gpu/drm/ttm/ttm_pool.c:73:29: error: variably modified > 'global_write_combined' at file scope >73 | static struct ttm_pool_type global_write_combined[TTM_DIM_ORDER]; > | ^ Ugh. This is because we have #define TTM_DIM_ORDER (__TTM_DIM_ORDER <= MAX_ORDER ? __TTM_DIM_ORDER : MAX_ORDER) which looks perfectly fine as a constant ("pick the smaller of MAX_ORDER and __TTM_DIM_ORDER"). But: #define __TTM_DIM_ORDER (TTM_MAX_ORDER + 1) #define TTM_MAX_ORDER (PMD_SHIFT - PAGE_SHIFT) which still _looks_ fine, but on 64-bit powerpc, we then have #define PTE_INDEX_SIZE __pte_index_size so that __TTM_DIM_ORDER constant isn't actually a constant at all. I would suggest that the DRM code just make those fixed-size arrays use "MAX_ORDER", because even though it might be a bit bigger than necessary, that's still not a very big constant (ie typically it's "11", but it could be up to 64). It's a bit sad how that macro that _looks_ like a constant (and is one pretty much everywhere else) isn't actually constant on powerpc, but looking around it looks fairly unavoidable. Maybe the DRM code could try to avoid using things like PMD_SHIFT entirely? Linus
Re: [PATCH 0/2] docs & checkpatch: allow Closes tags with links
On Thu, Mar 16, 2023 at 4:43 AM Matthieu Baerts wrote: > > @Linus: in short, we would like to continue using the "Closes:" tag (or > similar, see below) with a URL in commit messages. They are useful to > have public bug trackers doing automated actions like closing a specific > ticket. Any objection from your side? As long as it's a public link, I guess that just documents what the drm people have been doing. I'm not convinced "Closes" is actually any better than just "Link:", though. I would very much hope and expect that the actual closing of any bug report is actually done separately and verified, rather than some kind of automated "well, the commit says it closes it, so.." So honestly, I feel like "Link:" is just a better thing, and I worry that "Closes:" is then going to be used for random internal crap. We've very much seen people wanting to do that - having their own private bug trackers, and then using the commit message to refer to them, which I am *violently* against. If it's only useful to some closed community, it shouldn't be in the public commits. And while the current GPU people seem to use "Closes:" the right way (and maybe some other groups do too - but it does seem to be mostly a freedesktop thing), I really think it is amenable to mis-use in ways "Link:" is not. The point of "Link:" is explicitly two-fold: - it makes it quite obvious that you expect an actual valid web-link, not some internal garbage - random people always want random extensions, and "Link:" is _designed_ to counter-act that creeping "let's add a random new tag" disease. It's very explicitly "any relevant link". and I really question the value of adding new types of tags, particularly ones that seem almost designed to be mis-used. So I'm not violently against it, and 99% of the existing uses seem fine. But I do note that some of the early "Closes:" tags in the kernel were very much complete garbage, and exactly the kind of thing that I absolutely detest. What does Closes: 10437 mean? That's crazy talk. (And yes, in that case it was a kernel.bugzilla.org number, which is perfectly fine, but I'm using it as a very real example of how "Closes:" ends up being very naturally to mis-use). End result: I don't hate our current "Closes:" uses. But I'm very wary of it. I'm not at all convinced that it really adds a lot of value over "Link:", and I am, _very_ aware of how easily it can be then taken to be a "let's use our own bug tracker cookies here". So I will neither endorse nor condemn it, but if I see people using it wrong, I will absolutely put my foot down. Linus
Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
On Wed, Mar 15, 2023 at 5:22 PM Steven Rostedt wrote: > > I hope that this gets in by -rc3, as I want to start basing my next branch > on that tag. My tree should have it now as commit c00133a9e87e ("drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs"). Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:05 AM Nathan Chancellor wrote: > > On the clang front, I am still seeing the following warning turned error > for arm64 allmodconfig at least: > > drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is > uninitialized when used here [-Werror,-Wuninitialized] > if (syncpt_irq < 0) > ^~ Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised that gcc doesn't warn about this. That syncpt_irq thing isn't written to anywhere, so that's pretty egregious. We use -Wno-maybe-uninitialized because gcc gets it so wrong, but that's different from the "-Wuninitialized" thing (without the "maybe"). I've seen gcc mess this up when there is one single assignment, because then the SSA format makes it *so* easy to just use that assignment out-of-order (or unconditionally), but this case looks unusually clear-cut. So the fact that gcc doesn't warn about it is outright odd. > If that does not come to you through other means before -rc4, could you > just apply it directly so that I can stop applying it to our CI? :) Bah. I took it now, there's no excuse for that thing. Do we have any gcc people around that could explain why gcc failed so miserably at this trivial case? Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:26 AM Linus Torvalds wrote: > > Hmm. I do my arm64 allmodconfig builds with gcc, and I'm surprised > that gcc doesn't warn about this. Side note: I'm also wondering why that TEGRA_HOST1X config has that ARM dependency in depends on ARCH_TEGRA || (ARM && COMPILE_TEST) because it seems to build just fine at least on x86-64 if I change it to be just depends on ARCH_TEGRA || COMPILE_TEST ie there seems to be nothing ARM-specific in there. Limiting it to just the tegra platform by default makes 100% sense, but that "only do compile-testing on ARM" seems a bit bogus. That limit goes back to forever (commit 6f44c2b5280f: "gpu: host1x: Increase compile test coverage" back in Nov 2013), so maybe things didn't use to work as well back in the dark ages? None of this explains why gcc didn't catch it, but at least allowing the build on x86-64 would likely have made it easier for people to see clang catching this. Linus
Re: Linux 6.3-rc3
On Mon, Mar 20, 2023 at 11:56 AM Nathan Chancellor wrote: > > I did see a patch fly by to fix that: > > https://lore.kernel.org/20230316082035.567520-3-christian.koe...@amd.com/ > > It seems like the DRM_TEGRA half of it is broken though: > > https://lore.kernel.org/202303170635.a2rsq1wu-...@intel.com/ Hmm. x86-64 has 'vmap()' too. So I think that DRM_TEGRA breakage is likely just due to a missing header file include that then (by luck and mistake) gets included on arm. You need for 'vmap()'. There might be something else going on, I didn't look deeply at it. Linus