[Nouveau] [Bug 103897] Kernel 4.14 causes high cpu usage, 4.12 was OK
https://bugs.freedesktop.org/show_bug.cgi?id=103897 --- Comment #7 from Andrew Randrianasulu --- This bug still around for me with 4.15.0 -- You are receiving this mail because: You are the assignee for the bug.___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Wed, Feb 14, 2018 at 09:58:43AM -0500, Sean Paul wrote: > On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote: > > On 2018-02-14 03:08 PM, Sean Paul wrote: > > > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote: > > >> Op 14-02-18 om 09:46 schreef Lukas Wunner: > > >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > > Fix a deadlock on hybrid graphics laptops that's been present since > > 2013: > > >>> This series has been reviewed, consent has been expressed by the most > > >>> interested parties, patch [1/5] which touches files outside drivers/gpu > > >>> has been acked and I've just out a v2 addressing the only objection > > >>> raised. My plan is thus to wait another two days for comments and, > > >>> barring further objections, push to drm-misc this weekend. > > >>> > > >>> However I'm struggling with the decision whether to push to next or > > >>> fixes. The series is marked for stable, however the number of > > >>> affected machines is limited and for an issue that's been present > > >>> for 5 years it probably doesn't matter if it soaks another two months > > >>> in linux-next befor it gets backported. Hence I tend to err on the > > >>> side of caution and push to next, however a case could be made that > > >>> fixes is more appropriate. > > >>> > > >>> I'm lacking experience making such decisions and would be interested > > >>> to learn how you'd handle this. > > >> > > >> I would say fixes, it doesn't look particularly scary. :) > > > > > > Agreed. If it's good enough for stable, it's good enough for -fixes! > > > > It's not that simple, is it? Fast-tracking patches (some of which appear > > to be untested) to stable without an immediate cause for urgency seems > > risky to me. > > /me should be more careful what he says > > Given where we are in the release cycle, it's barely a fast track. > If these go in -fixes, they'll get in -rc2 and will have plenty of > time to bake. If we were at rc5, it might be a different story. The patches are marked for stable though, so if they go in through drm-misc-fixes, they may appear in stable kernels before 4.16-final is out. Greg picks up patches once they're in Linus' tree, though often with a delay of a few days or weeks. If they go in through drm-misc-next, they're guaranteed not to appear in *any* release before 4.16-final is out. This allows for differentiation between no-brainer stable fixes that can be sent immediately and scarier, but similarly important stable fixes that should soak for a while. I'm not sure which category this series belongs to, though it's true what Maarten says, it's not *that* grave a change. Thanks, Lukas ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Addressing the problem of noisy GPUs under Nouveau
On 07/02/18 05:31, John Hubbard wrote: > On 01/28/2018 04:05 PM, Martin Peres wrote: >> On 29/01/18 01:24, Martin Peres wrote: >>> On 28/11/17 07:32, John Hubbard wrote: On 11/23/2017 02:48 PM, Martin Peres wrote: > On 23/11/17 10:06, John Hubbard wrote: >> On 11/22/2017 05:07 PM, Martin Peres wrote: >>> Hey, >>> >>> Thanks for your answer, Andy! >>> >>> On 22/11/17 04:06, Ilia Mirkin wrote: On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger wrote: Martin's question was very long, but it boils down to this: How do we compute the correct values to write into the e114/e118 pwm registers based on the VBIOS contents and current state of the board (like temperature). >>> >>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on >>> GF119+, or 0x200cd/d0 on Kepler+. >>> >>> At least, it looks like we know which PWM controler we need to drive, so >>> I did not want to muddy the water even more by giving register >>> addresses, rather concentrating on the problem at hand: How to compute >>> the duty value for the PWM controler. >>> We generally do this right, but appear to get it extra-wrong for certain GPUs. >>> >>> Yes... So far, we are always safe, but users tend to mind when their >>> computer sound like a jumbo jet at take off... Who would have thought? >>> :D >>> >>> Anyway, looking forward to your answer! >>> >>> Cheers, >>> Martin >> > [...] > > Hi Martin, > > I strongly suspect you are seeing a special behavior, which is: on > some GF108 boards we use only a very limited range of PWM, > 0.4 to 2.5%, due to the particular type of DC power conversion > circuit on those boards. However, it could also just be difficulties > in interpreting the fixed-point variables in the tables. In either > case, the answer is to explain those formats, so I'll do that now. > > I am attaching the fan cooler table, in HTML format. We have also > published the BIT (BIOS Information Table) format, separately: > > > http://download.nvidia.com/open-gpu-doc/BIOS-Information-Table/1/BIOS-Information-Table.html > > , but I don't think it has any surprises for you, in this regard. You > can check it, to be sure you're looking at the right subtable, though, > just in case. > > The interesting parts of that table are: > > PWM Scale Slope (16 bits): > > Slope to scale effective PWM to actual PWM (1/4096, F4.12, signed). > For backwards compatibility, a value of 0.0 (0x) is interpreted as 1.0 > (0x1000). > This value is used to scale the effective PWM duty cycle, a conceptual > fraction > of full speed (0% to 100%), to the actual electrical PWM duty cycle. > PWM(actual) = Slope × PWM(effective) + Offset > > PWM Scale Offset (16 bits): > > Offset to scale effective PWM to actual PWM (1/4096, F4.12, signed). > This value is used to scale the effective PWM duty cycle, a conceptual > fraction > of full speed (0% to 100%), to the actual electrical PWM duty cycle. > PWM(actual) = Slope × PWM(effective) + Offset > > > However, the calculations are hard to get right, and the table stores > values in fixed-point format, so I'm showing a few simplified code excerpts > that use these. The various fixed point macro definitions are found as part of > our normal driver package, in nvmisc.h and nvtypes.h. Any other definitions > that you need are included right here (I ran a quick compiler check to be > sure.) Wow John, thanks a lot! Sorry for the delay, I was on vacation when you posted this, but this definitely is what I was looking for! Thanks a lot for the code example, I will try to make use of it soon and come back to you if I still have issues! Martin ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
> Actually this was brought up to me already, there's a fix on the mailing list > for this I reviewed a little while ago from nvidia that we should pull in: > > https://patchwork.freedesktop.org/patch/203205/ > > Would you guys mind confirming that this patch fixes your issues? It works on my amd64, P4 is still compiling. [1.124987] nouveau :04:05.0: NVIDIA NV05 (20154000) [1.161464] nouveau :04:05.0: bios: version 03.05.00.10.00 [1.161475] nouveau :04:05.0: bios: DCB table not found [1.161535] nouveau :04:05.0: bios: DCB table not found [1.161577] nouveau :04:05.0: bios: DCB table not found [1.161586] nouveau :04:05.0: bios: DCB table not found [1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz [1.344024] clocksource: tsc: mask: 0x max_cycles: 0x1fb67c69f81, max_idle_ns: 440795210317 ns [1.344037] clocksource: Switched to clocksource tsc [1.408102] nouveau :04:05.0: tmr: unknown input clock freq [1.409471] nouveau :04:05.0: fb: 32 MiB SDRAM [1.414459] nouveau :04:05.0: DRM: VRAM: 31 MiB [1.414467] nouveau :04:05.0: DRM: GART: 128 MiB [1.414476] nouveau :04:05.0: DRM: BMP version 5.17 [1.414484] nouveau :04:05.0: DRM: No DCB data found in VBIOS [1.415629] nouveau :04:05.0: DRM: Adaptor not initialised, running VBIOS init tables. [1.415829] nouveau :04:05.0: bios: DCB table not found [1.416125] nouveau :04:05.0: DRM: Saving VGA fonts [1.477526] nouveau :04:05.0: DRM: No DCB data found in VBIOS [1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.478438] [drm] Driver supports precise vblank timestamp query. [1.479618] nouveau :04:05.0: DRM: MM: using M2MF for buffer copies [1.517930] nouveau :04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo a09f4d1f [1.519294] nouveau :04:05.0: fb1: nouveaufb frame buffer device [1.519313] [drm] Initialized nouveau 1.3.1 20120801 for :04:05.0 on minor 1 -- Meelis Roos (mr...@linux.ee) ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [Bug 105097] Computer hangs only mouse moves
https://bugs.freedesktop.org/show_bug.cgi?id=105097 Dmitry Yakimov changed: What|Removed |Added Hardware|Other |x86-64 (AMD64) OS|All |Linux (All) -- You are receiving this mail because: You are the assignee for the bug.___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [Bug 105097] Computer hangs only mouse moves
https://bugs.freedesktop.org/show_bug.cgi?id=105097 --- Comment #1 from Dmitry Yakimov --- Created attachment 137361 --> https://bugs.freedesktop.org/attachment.cgi?id=137361&action=edit XOrg log -- You are receiving this mail because: You are the assignee for the bug.___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
[Nouveau] [Bug 105097] New: Computer hangs only mouse moves
https://bugs.freedesktop.org/show_bug.cgi?id=105097 Bug ID: 105097 Summary: Computer hangs only mouse moves Product: xorg Version: unspecified Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Driver/nouveau Assignee: nouveau@lists.freedesktop.org Reporter: yaru...@gmail.com QA Contact: xorg-t...@lists.x.org Created attachment 137360 --> https://bugs.freedesktop.org/attachment.cgi?id=137360&action=edit dmesg log I have attached dmesg output and foound some messageg in syslog before freezing: Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [ 73.308412] nouveau :01:00.0: fifo: write fault at 24 engine 00 [GR] client 0f [GPC0/PROP_0] reason 02 [PTE] on channel 2 [003fbf8000 Xorg[1036]] Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [ 73.308436] nouveau :01:00.0: fifo: channel 2: killed Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [ 73.308440] nouveau :01:00.0: fifo: runlist 0: scheduled for recovery Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [ 73.308446] nouveau :01:00.0: fifo: engine 0: scheduled for recovery Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [ 73.308539] nouveau :01:00.0: Xorg[1036]: channel 2 killed! -- You are receiving this mail because: You are the assignee for the bug.___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
Actually this was brought up to me already, there's a fix on the mailing list for this I reviewed a little while ago from nvidia that we should pull in: https://patchwork.freedesktop.org/patch/203205/ Would you guys mind confirming that this patch fixes your issues? On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote: > On 2018-02-14 — 09:36, Ilia Mirkin wrote: > > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin wrote: > > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: > > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in > > > > > 4.15: > > > > > > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on > > > > boot, in nvkm_therm_clkgate_fini. > > > > > > Mind booting with nouveau.debug=trace? That should hopefully tell us > > > more exactly which thing is dying. If you have a cross-compile/distcc > > > setup handy, a bisect may be even more useful. > > > > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is > > somehow mis-hooked up for NV5 now. A bisect result would still make > > the culprit a lot more obvious. > > CC’ing Lyude Paul as she hooked up the clockgating support. > > Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t > nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and > nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of > their check for the clkgate_* hooks being there? Or instead, maybe have the > check in nvkm_device_init() nvkm_device_init()? > > Pierre -- Cheers, Lyude Paul ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH v2] drm: Allow determining if current task is output poll worker
I think your idea of having the extra kerneldoc as a seperate patch to make this easier to backport should work fine :). Thanks for the good work! Reviewed-by: Lyude Paul On Wed, 2018-02-14 at 08:41 +0100, Lukas Wunner wrote: > Introduce a helper to determine if the current task is an output poll > worker. > > This allows us to fix a long-standing deadlock in several DRM drivers > wherein the ->runtime_suspend callback waits for the output poll worker > to finish and the worker in turn calls a ->detect callback which waits > for runtime suspend to finish. The ->detect callback is invoked from > multiple call sites and waiting for runtime suspend to finish is the > correct thing to do except if it's executing in the context of the > worker. > > v2: Expand kerneldoc to specifically mention deadlock between > output poll worker and autosuspend worker as use case. (Lyude) > > Cc: Dave Airlie > Cc: Ben Skeggs > Cc: Alex Deucher > Reviewed-by: Lyude Paul > Signed-off-by: Lukas Wunner > --- > drivers/gpu/drm/drm_probe_helper.c | 20 > include/drm/drm_crtc_helper.h | 1 + > 2 files changed, 21 insertions(+) > > diff --git a/drivers/gpu/drm/drm_probe_helper.c > b/drivers/gpu/drm/drm_probe_helper.c > index 6dc2dde5b672..7a6b2dc08913 100644 > --- a/drivers/gpu/drm/drm_probe_helper.c > +++ b/drivers/gpu/drm/drm_probe_helper.c > @@ -654,6 +654,26 @@ static void output_poll_execute(struct work_struct > *work) > schedule_delayed_work(delayed_work, > DRM_OUTPUT_POLL_PERIOD); > } > > +/** > + * drm_kms_helper_is_poll_worker - is %current task an output poll worker? > + * > + * Determine if %current task is an output poll worker. This can be used > + * to select distinct code paths for output polling versus other contexts. > + * > + * One use case is to avoid a deadlock between the output poll worker and > + * the autosuspend worker wherein the latter waits for polling to finish > + * upon calling drm_kms_helper_poll_disable(), while the former waits for > + * runtime suspend to finish upon calling pm_runtime_get_sync() in a > + * connector ->detect hook. > + */ > +bool drm_kms_helper_is_poll_worker(void) > +{ > + struct work_struct *work = current_work(); > + > + return work && work->func == output_poll_execute; > +} > +EXPORT_SYMBOL(drm_kms_helper_is_poll_worker); > + > /** > * drm_kms_helper_poll_disable - disable output polling > * @dev: drm_device > diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h > index 76e237bd989b..6914633037a5 100644 > --- a/include/drm/drm_crtc_helper.h > +++ b/include/drm/drm_crtc_helper.h > @@ -77,5 +77,6 @@ void drm_kms_helper_hotplug_event(struct drm_device *dev); > > void drm_kms_helper_poll_disable(struct drm_device *dev); > void drm_kms_helper_poll_enable(struct drm_device *dev); > +bool drm_kms_helper_is_poll_worker(void); > > #endif -- Cheers, Lyude Paul ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On 2018-02-14 — 09:36, Ilia Mirkin wrote: > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin wrote: > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: > >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: > >> > >> NV5 in another PC (secondary card in x86-64) made the systrem crash on > >> boot, in nvkm_therm_clkgate_fini. > > > > Mind booting with nouveau.debug=trace? That should hopefully tell us > > more exactly which thing is dying. If you have a cross-compile/distcc > > setup handy, a bisect may be even more useful. > > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is > somehow mis-hooked up for NV5 now. A bisect result would still make > the culprit a lot more obvious. CC’ing Lyude Paul as she hooked up the clockgating support. Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of their check for the clkgate_* hooks being there? Or instead, maybe have the check in nvkm_device_init() nvkm_device_init()? Pierre signature.asc Description: PGP signature ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On 12 February 2018 at 03:39, Lukas Wunner wrote: > On Mon, Feb 12, 2018 at 12:35:51AM +, Mike Lothian wrote: >> I've not been able to reproduce the original problem you're trying to >> solve on amdgpu thats with or without your patch set and the above >> "trigger" too >> >> Is anything else required to trigger it, I started multiple DRI_PRIME >> glxgears, in parallel, serial waiting the 12 seconds and serial within >> the 12 seconds and I couldn't reproduce it > > The discrete GPU needs to runtime suspend, that's the trigger, > so no DRI_PRIME executables should be running. Just let it > autosuspend after boot. Do you see "waiting 12 sec" messages > in dmesg? If not it's not autosuspending. > > Thanks, > > Lukas Hi Yes I'm seeing those messages, I'm just not seeing the hangs I've attached the dmesg in case you're interested Regards Mike dmesg.nohangs Description: Binary data ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct
Hello, On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > Introduce a helper to retrieve the current task's work struct if it is > a workqueue worker. > > This allows us to fix a long-standing deadlock in several DRM drivers > wherein the ->runtime_suspend callback waits for a specific worker to > finish and that worker in turn calls a function which waits for runtime > suspend to finish. That function is invoked from multiple call sites > and waiting for runtime suspend to finish is the correct thing to do > except if it's executing in the context of the worker. > > Cc: Tejun Heo > Cc: Lai Jiangshan > Cc: Dave Airlie > Cc: Ben Skeggs > Cc: Alex Deucher > Signed-off-by: Lukas Wunner I wonder whether it's too generic a name but there are other functions named in a similar fashion and AFAICS current_work isn't used by anyone in the tree, so it seems okay. Acked-by: Tejun Heo Please feel free to route as you see fit. Thanks. -- tejun ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote: > On 2018-02-14 03:08 PM, Sean Paul wrote: > > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote: > >> Op 14-02-18 om 09:46 schreef Lukas Wunner: > >>> Dear drm-misc maintainers, > >>> > >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > Fix a deadlock on hybrid graphics laptops that's been present since 2013: > >>> This series has been reviewed, consent has been expressed by the most > >>> interested parties, patch [1/5] which touches files outside drivers/gpu > >>> has been acked and I've just out a v2 addressing the only objection > >>> raised. My plan is thus to wait another two days for comments and, > >>> barring further objections, push to drm-misc this weekend. > >>> > >>> However I'm struggling with the decision whether to push to next or > >>> fixes. The series is marked for stable, however the number of > >>> affected machines is limited and for an issue that's been present > >>> for 5 years it probably doesn't matter if it soaks another two months > >>> in linux-next befor it gets backported. Hence I tend to err on the > >>> side of caution and push to next, however a case could be made that > >>> fixes is more appropriate. > >>> > >>> I'm lacking experience making such decisions and would be interested > >>> to learn how you'd handle this. > >>> > >>> Thanks, > >>> > >>> Lukas > >> > >> I would say fixes, it doesn't look particularly scary. :) > > > > Agreed. If it's good enough for stable, it's good enough for -fixes! > > It's not that simple, is it? Fast-tracking patches (some of which appear > to be untested) to stable without an immediate cause for urgency seems > risky to me. > /me should be more careful what he says Given where we are in the release cycle, it's barely a fast track. If these go in -fixes, they'll get in -rc2 and will have plenty of time to bake. If we were at rc5, it might be a different story. Sean > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer -- Sean Paul, Software Engineer, Google / Chromium OS ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On 2018-02-14 03:08 PM, Sean Paul wrote: > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote: >> Op 14-02-18 om 09:46 schreef Lukas Wunner: >>> Dear drm-misc maintainers, >>> >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: Fix a deadlock on hybrid graphics laptops that's been present since 2013: >>> This series has been reviewed, consent has been expressed by the most >>> interested parties, patch [1/5] which touches files outside drivers/gpu >>> has been acked and I've just out a v2 addressing the only objection >>> raised. My plan is thus to wait another two days for comments and, >>> barring further objections, push to drm-misc this weekend. >>> >>> However I'm struggling with the decision whether to push to next or >>> fixes. The series is marked for stable, however the number of >>> affected machines is limited and for an issue that's been present >>> for 5 years it probably doesn't matter if it soaks another two months >>> in linux-next befor it gets backported. Hence I tend to err on the >>> side of caution and push to next, however a case could be made that >>> fixes is more appropriate. >>> >>> I'm lacking experience making such decisions and would be interested >>> to learn how you'd handle this. >>> >>> Thanks, >>> >>> Lukas >> >> I would say fixes, it doesn't look particularly scary. :) > > Agreed. If it's good enough for stable, it's good enough for -fixes! It's not that simple, is it? Fast-tracking patches (some of which appear to be untested) to stable without an immediate cause for urgency seems risky to me. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin wrote: > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: >> >> NV5 in another PC (secondary card in x86-64) made the systrem crash on >> boot, in nvkm_therm_clkgate_fini. > > Mind booting with nouveau.debug=trace? That should hopefully tell us > more exactly which thing is dying. If you have a cross-compile/distcc > setup handy, a bisect may be even more useful. Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is somehow mis-hooked up for NV5 now. A bisect result would still make the culprit a lot more obvious. ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos wrote: >> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: > > NV5 in another PC (secondary card in x86-64) made the systrem crash on > boot, in nvkm_therm_clkgate_fini. Mind booting with nouveau.debug=trace? That should hopefully tell us more exactly which thing is dying. If you have a cross-compile/distcc setup handy, a bisect may be even more useful. It's funny, I had a NV5 plugged into my desktop for testing, and *just* took it out (because the box wouldn't even get to BIOS anymore ... although it was unrelated to the NV5, probably just something mis-seated.) -ilia ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini
> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15: NV5 in another PC (secondary card in x86-64) made the systrem crash on boot, in nvkm_therm_clkgate_fini. -- Meelis Roos (mr...@linux.ee) ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote: > Op 14-02-18 om 09:46 schreef Lukas Wunner: > > Dear drm-misc maintainers, > > > > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > >> Fix a deadlock on hybrid graphics laptops that's been present since 2013: > > This series has been reviewed, consent has been expressed by the most > > interested parties, patch [1/5] which touches files outside drivers/gpu > > has been acked and I've just out a v2 addressing the only objection > > raised. My plan is thus to wait another two days for comments and, > > barring further objections, push to drm-misc this weekend. > > > > However I'm struggling with the decision whether to push to next or > > fixes. The series is marked for stable, however the number of > > affected machines is limited and for an issue that's been present > > for 5 years it probably doesn't matter if it soaks another two months > > in linux-next befor it gets backported. Hence I tend to err on the > > side of caution and push to next, however a case could be made that > > fixes is more appropriate. > > > > I'm lacking experience making such decisions and would be interested > > to learn how you'd handle this. > > > > Thanks, > > > > Lukas > > I would say fixes, it doesn't look particularly scary. :) Agreed. If it's good enough for stable, it's good enough for -fixes! Sean > > ~Maarten > -- Sean Paul, Software Engineer, Google / Chromium OS ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
On Tue, Feb 13, 2018 at 03:46:08PM +, Liviu Dudau wrote: > On Tue, Feb 13, 2018 at 12:52:06PM +0100, Lukas Wunner wrote: > > On Tue, Feb 13, 2018 at 10:55:06AM +, Liviu Dudau wrote: > > > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > > > > DRM drivers poll connectors in 10 sec intervals. The poll worker is > > > > stopped on ->runtime_suspend with cancel_delayed_work_sync(). However > > > > the poll worker invokes the DRM drivers' ->detect callbacks, which call > > > > pm_runtime_get_sync(). If the poll worker starts after runtime suspend > > > > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish > > > > with the intention of runtime resuming the device afterwards. The > > > > result > > > > is a circular wait between poll worker and autosuspend worker. > > > > > > I think I understand the problem you are trying to solve, but I'm > > > struggling to understand where malidp makes any specific mistakes. First > > > of all, malidp is only a display engine, so there is no GPU attached to > > > it, but that is only a small clarification. Second, malidp doesn't use > > > directly any of the callbacks that you are referring to, it uses the > > > drm_cma_() API plus the generic drm_() call. So if there are any > > > issues there (as they might well be) I think they would apply to a lot > > > more drivers and the fix will involve more than just malidp, i915 and > > > msm. [snip] > > There are no ->detect hooks declared > > in drivers/gpu/drm/arm/, so it's unclear to me whether you're able to probe > > during runtime suspend. > > That's because the drivers in drivers/gpu/drm/arm do not have > connectors, they are only the CRTC part of the driver. Both hdlcd and > mali-dp use the component framework to locate an encoder in device tree > that will then provide the connectors. > > > > > hdlcd_drv.c and malidp_drv.c both enable output polling. Output polling > > is only necessary if you don't get HPD interrupts. > > That's right, hdlcd and mali-dp don't receive HPD interrupts because > they don't have any. And because we don't know ahead of time which > encoder/connector will be paired with the driver, we enable polling as a > safe fallback. > Looking e.g. at inno_hdmi.c (used by rk3036.dtsi), this calls drm_helper_hpd_irq_event() on receiving an HPD interrupt, and that function returns immediately if polling is not enabled. So you *have* to enable polling to receive HPD events. You seem to keep the crtc runtime active as long as it's bound to an encoder. If you do not ever intend to runtime suspend the crtc while an encoder is attached, you don't need to keep polling enabled during runtime suspend (because there's nothing to poll), but it shouldn't hurt either. If you would runtime suspend while an encoder is attached, then you would only runtime resume every 10 sec (upon polling) if the encoder was a child of the crtc and would support runtime suspend as well. That's because the PM core wakes the parent by default when a child runtime resumes. However in the DT's I've looked at, the encoder is never a child of the crtc and at least inno_hdmi.c doesn't use runtime suspend. So I think you're all green, I can't spot any grave issues here. Just be aware of the above-mentioned constraints. Thanks, Lukas ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
Op 14-02-18 om 09:46 schreef Lukas Wunner: > Dear drm-misc maintainers, > > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: >> Fix a deadlock on hybrid graphics laptops that's been present since 2013: > This series has been reviewed, consent has been expressed by the most > interested parties, patch [1/5] which touches files outside drivers/gpu > has been acked and I've just out a v2 addressing the only objection > raised. My plan is thus to wait another two days for comments and, > barring further objections, push to drm-misc this weekend. > > However I'm struggling with the decision whether to push to next or > fixes. The series is marked for stable, however the number of > affected machines is limited and for an issue that's been present > for 5 years it probably doesn't matter if it soaks another two months > in linux-next befor it gets backported. Hence I tend to err on the > side of caution and push to next, however a case could be made that > fixes is more appropriate. > > I'm lacking experience making such decisions and would be interested > to learn how you'd handle this. > > Thanks, > > Lukas I would say fixes, it doesn't look particularly scary. :) ~Maarten ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers
Dear drm-misc maintainers, On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote: > Fix a deadlock on hybrid graphics laptops that's been present since 2013: This series has been reviewed, consent has been expressed by the most interested parties, patch [1/5] which touches files outside drivers/gpu has been acked and I've just out a v2 addressing the only objection raised. My plan is thus to wait another two days for comments and, barring further objections, push to drm-misc this weekend. However I'm struggling with the decision whether to push to next or fixes. The series is marked for stable, however the number of affected machines is limited and for an issue that's been present for 5 years it probably doesn't matter if it soaks another two months in linux-next befor it gets backported. Hence I tend to err on the side of caution and push to next, however a case could be made that fixes is more appropriate. I'm lacking experience making such decisions and would be interested to learn how you'd handle this. Thanks, Lukas ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker
On Mon, Feb 12, 2018 at 12:46:11PM -0500, Lyude Paul wrote: > On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote: > > Introduce a helper to determine if the current task is an output poll > > worker. > > > > This allows us to fix a long-standing deadlock in several DRM drivers > > wherein the ->runtime_suspend callback waits for the output poll worker > > to finish and the worker in turn calls a ->detect callback which waits > > for runtime suspend to finish. The ->detect callback is invoked from > > multiple call sites and waiting for runtime suspend to finish is the > > correct thing to do except if it's executing in the context of the > > worker. [snip] > > +/** > > + * drm_kms_helper_is_poll_worker - is %current task an output poll worker? > > + * > > + * Determine if %current task is an output poll worker. This can be used > > + * to select distinct code paths for output polling versus other contexts. > > + */ > > For this, it would be worth explicitly noting in the comments herethat this > should be called by DRM drivers in order to prevent racing with hotplug > polling workers, so that new drivers in the future can avoid implementing this > race condition in their driver. Good point, I've just sent out a v2 to address your comment. Let me know if this isn't what you had in mind. It may also be worth to expand the DOC section at the top of drm_probe_helper.c to explain the interaction between polling and runtime suspend in more detail, but I think this is better done in a separate patch to keep the present patch small and thus easily backportable to stable. Thanks a lot for the review, Lukas ___ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau