Re: [NOT A REGRESSION] firmware: framebuffer-coreboot: duplicate device name "simple-framebuffer.0"

2024-09-13 Thread Brian Norris
Hi Javier,

On Thu, Sep 12, 2024 at 06:33:58PM +0200, Javier Martinez Canillas wrote:
> That's a very good point. I'm actually not familiar with Coreboot and I
> used an educated guess (in the case of DT for example, that's the main
> source of truth and I didn't know if a Core table was in a similar vein).
> 
> Maybe something like the following (untested) patch then?

Julius is more familiar with the Coreboot + payload ecosystem than me,
but his explanations make sense to me, as does this patch.

> From de1c32017006f4671d91b695f4d6b4e99c073ab2 Mon Sep 17 00:00:00 2001
> From: Javier Martinez Canillas 
> Date: Thu, 12 Sep 2024 18:31:55 +0200
> Subject: [PATCH] firmware: coreboot: Don't register a pdev if screen_info data
>  is available
> 
> On Coreboot platforms, a system framebuffer may be provided to the Linux
> kernel by filling a LB_TAG_FRAMEBUFFER entry in the Coreboot table. But
> a Coreboot payload (e.g: SeaBIOS) could also provide this information to
> the Linux kernel.
> 
> If that the case, early arch x86 boot code will fill the global struct
> screen_info data and that data used by the Generic System Framebuffers
> (sysfb) framework to add a platform device with platform data about the
> system framebuffer.

Normally, these sorts of "early" and "later" ordering descriptions would
set alarm bells when talking about independent drivers. But I suppose
the "early arch" code has better ordering guaranteeds than drivers, so
this should be fine.

> But later then the framebuffer_coreboot driver will try to do the same
> framebuffer (using the information from the Coreboot table), which will
> lead to an error due a simple-framebuffer.0 device already registered:
> 
> sysfs: cannot create duplicate filename 
> '/bus/platform/devices/simple-framebuffer.0'
> ...
> coreboot: could not register framebuffer
> framebuffer coreboot8: probe with driver framebuffer failed with error -17
> 
> To prevent the issue, make the framebuffer_core driver to not register a
> platform device if the global struct screen_info data has been filled.
> 
> Reported-by: Brian Norris 
> Link: 
> https://lore.kernel.org/all/ZuCG-DggNThuF4pj@b20ea791c01f/T/#ma7fb65acbc1a56042258adac910992bb225a20d2
> Suggested-by: Julius Werner 
> Signed-off-by: Javier Martinez Canillas 
> ---
>  drivers/firmware/google/framebuffer-coreboot.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/firmware/google/framebuffer-coreboot.c 
> b/drivers/firmware/google/framebuffer-coreboot.c
> index daadd71d8ddd..4e50da17cd7e 100644
> --- a/drivers/firmware/google/framebuffer-coreboot.c
> +++ b/drivers/firmware/google/framebuffer-coreboot.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "coreboot_table.h"
>  
> @@ -27,6 +28,7 @@ static int framebuffer_probe(struct coreboot_device *dev)
>   int i;
>   u32 length;
>   struct lb_framebuffer *fb = &dev->framebuffer;
> + struct screen_info *si = &screen_info;
>   struct platform_device *pdev;
>   struct resource res;
>   struct simplefb_platform_data pdata = {
> @@ -36,6 +38,20 @@ static int framebuffer_probe(struct coreboot_device *dev)
>   .format = NULL,
>   };
>  
> + /*
> +  * If the global screen_info data has been filled, the Generic
> +  * System Framebuffers (sysfb) will already register a platform

Did you mean 'platform_device'?

> +  * and pass the screen_info as platform_data to a driver that
> +  * could scan-out using the system provided framebuffer.
> +  *
> +  * On Coreboot systems, the advertise LB_TAG_FRAMEBUFFER entry

s/advertise/advertised/ ?

> +  * in the Coreboot table should only be used if the payload did
> +  * not set video mode info and passed it to the Linux kernel.

s/passed/pass/

> +  */
> + if (si->orig_video_isVGA == VIDEO_TYPE_VLFB ||
> +si->orig_video_isVGA == VIDEO_TYPE_EFI)

This line is using spaces for indentation. It should use a tab, and then
spaces for alignment. But presumably this will change based on Thomas's
suggestions anyway.

> + return -EINVAL;

Is EINVAL right? IIUC, that will print a noisier error to the logs. I
believe the "expected" sorts of return codes are ENODEV or ENXIO. (See
call_driver_probe().) ENODEV seems like a fine choice, similar to
several of the other return codes already used here.

Anyway, this seems along the right track. Thanks for tackling, and feel
free to carry a:

Reviewed-by: Brian Norris 

> +
>   if (!fb->physical_address)
>   return -ENODEV;
>  
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat
> 


[NOT A REGRESSION] firmware: framebuffer-coreboot: duplicate device name "simple-framebuffer.0"

2024-09-10 Thread Brian Norris
(Tweaking subject; this indeed isn't related to the regression at all)

Hi,

On Mon, Sep 09, 2024 at 10:02:00AM +0200, Borislav Petkov wrote:
> Looking at your log, the first warn is in framebuffer_coreboot. Some mess in
> the sysfs platform devices registration.
> 
> Adding the relevant people for that:
> 
> Aug 20 20:29:36 luna kernel: sysfs: cannot create duplicate filename 
> '/bus/platform/devices/simple-framebuffer.0'
> Aug 20 20:29:36 luna kernel: CPU: 5 PID: 571 Comm: (udev-worker) Tainted: G   
> OE  6.10.6-arch1-1 #1 703d152c24f1971e36f16e505405e456fc9e23f8
> Aug 20 20:29:36 luna kernel: Hardware name: Purism Librem 14/Librem 14, BIOS 
> 4.14-Purism-1 06/18/2021
> Aug 20 20:29:36 luna kernel: Call Trace:
> Aug 20 20:29:36 luna kernel:  
> Aug 20 20:29:36 luna kernel:  dump_stack_lvl+0x5d/0x80
> Aug 20 20:29:36 luna kernel:  sysfs_warn_dup.cold+0x17/0x23
> Aug 20 20:29:36 luna kernel:  sysfs_do_create_link_sd+0xcf/0xe0
> Aug 20 20:29:36 luna kernel:  bus_add_device+0x6b/0x130
> Aug 20 20:29:36 luna kernel:  device_add+0x3b3/0x870
> Aug 20 20:29:36 luna kernel:  platform_device_add+0xed/0x250
> Aug 20 20:29:36 luna kernel:  platform_device_register_full+0xbb/0x140
> Aug 20 20:29:36 luna kernel:  
> platform_device_register_resndata.constprop.0+0x54/0x80 [framebuffer_coreboot 
> a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  framebuffer_probe+0x165/0x1b0 
> [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  really_probe+0xdb/0x340
> Aug 20 20:29:36 luna kernel:  ? pm_runtime_barrier+0x54/0x90
> Aug 20 20:29:36 luna kernel:  ? __pfx___driver_attach+0x10/0x10
> Aug 20 20:29:36 luna kernel:  __driver_probe_device+0x78/0x110
> Aug 20 20:29:36 luna kernel:  driver_probe_device+0x1f/0xa0
> Aug 20 20:29:36 luna kernel:  __driver_attach+0xba/0x1c0
> Aug 20 20:29:36 luna kernel:  bus_for_each_dev+0x8c/0xe0
> Aug 20 20:29:36 luna kernel:  bus_add_driver+0x112/0x1f0
> Aug 20 20:29:36 luna kernel:  driver_register+0x72/0xd0
> Aug 20 20:29:36 luna kernel:  ? __pfx_framebuffer_driver_init+0x10/0x10 
> [framebuffer_coreboot a587d2fc243ebaa0205c3badd33442a004d284e0]
> Aug 20 20:29:36 luna kernel:  do_one_initcall+0x58/0x310
> Aug 20 20:29:36 luna kernel:  do_init_module+0x60/0x220
> Aug 20 20:29:36 luna kernel:  init_module_from_file+0x89/0xe0
> Aug 20 20:29:36 luna kernel:  idempotent_init_module+0x121/0x320
> Aug 20 20:29:36 luna kernel:  __x64_sys_finit_module+0x5e/0xb0
> Aug 20 20:29:36 luna kernel:  do_syscall_64+0x82/0x190
> Aug 20 20:29:36 luna kernel:  ? __do_sys_newfstatat+0x3c/0x80
> Aug 20 20:29:36 luna kernel:  ? syscall_exit_to_user_mode+0x72/0x200
> Aug 20 20:29:36 luna kernel:  ? do_syscall_64+0x8e/0x190
> Aug 20 20:29:36 luna kernel:  ? do_sys_openat2+0x9c/0xe0
> Aug 20 20:29:36 luna kernel:  ? syscall_exit_to_user_mode+0x72/0x200
> Aug 20 20:29:36 luna kernel:  ? do_syscall_64+0x8e/0x190
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  ? clear_bhb_loop+0x25/0x80
> Aug 20 20:29:36 luna kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> Aug 20 20:29:36 luna kernel: RIP: 0033:0x7b1bee2f81fd

Looks like it might be a conflict with
drivers/firmware/sysfb_simplefb.c, which also uses the
"simple-framebuffer" name with a constant ID of 0. It's possible both
drivers should be switched to use PLATFORM_DEVID_AUTO? Or at least one
of them. Or they should use different base names.

I'm not really sure what the best option is (does anyone rely on or care
about the device naming?), and I don't actually use this driver. But
here's an untested diff to try if you'd really like. If you test it,
feel free to submit as a proper patch with my:

Signed-off-by: Brian Norris 

diff --git a/drivers/firmware/google/framebuffer-coreboot.c 
b/drivers/firmware/google/framebuffer-coreboot.c
index daadd71d8ddd..3f1b8f664c3f 100644
--- a/drivers/firmware/google/framebuffer-coreboot.c
+++ b/drivers/firmware/google/framebuffer-coreboot.c
@@ -62,7 +62,8 @@ static int framebuffer_probe(struct coreboot_device *dev)
return -EINVAL;
 
pdev = platform_device_register_resndata(&dev->dev,
-"simple-framebuffer", 0,
+"simple-framebuffer",
+PLATFORM_DEVID_AUTO,
 &res, 1, &pdata,
 sizeof(pdata));
if (IS_ERR(pdev))


Re: [Intel-gfx] [PATCH] drm/i915/huc: fix leak of debug object in huc load fence on driver unload

2022-11-16 Thread Brian Norris
Hi Daniele,

On Thu, Nov 10, 2022 at 04:56:51PM -0800, Daniele Ceraolo Spurio wrote:
> The fence is always initialized in huc_init_early, but the cleanup in
> huc_fini is only being run if HuC is enabled. This causes a leaking of
> the debug object when HuC is disabled/not supported, which can in turn
> trigger a warning if we try to register a new debug offset at the same
> address on driver reload.
> 
> To fix the issue, make sure to always run the cleanup code.
> 
> Reported-by: Tvrtko Ursulin 
> Reported-by: Brian Norris 
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio 
> Cc: Tvrtko Ursulin 
> Cc: Brian Norris 
> Cc: Alan Previn 
> Cc: John Harrison 
> ---
> 
> Note: I didn't manage to repro the reported warning, but I did confirm
> that we weren't correctly calling i915_sw_fence_fini and that this patch
> fixes that.

I *did* reproduce, and with this patch, I no longer reproduce. So:

Tested-by: Brian Norris 

I see this differs very slightly from the draft version (which didn't
work for me):

https://lore.kernel.org/all/ac5fde11-c17d-8574-c938-c2278d53c...@intel.com/

so presumably that diff is the fix.

Thanks a bunch!

Brian

>  drivers/gpu/drm/i915/gt/uc/intel_huc.c | 12 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c  |  1 +
>  2 files changed, 8 insertions(+), 5 deletions(-)


Re: [Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

2022-11-07 Thread Brian Norris
On Mon, Nov 07, 2022 at 10:38:14AM -0800, Ceraolo Spurio, Daniele wrote:
> Ok, I think I have an idea of what's happening: if HuC is not enabled, we
> skip the call to fence_fini, so we leak the debug object. Can you check if
> the below diff fixes the issue for you?

Thanks for checking! This also gives me the hint that I can try out the
HuC firmware to see if that changes anything for me. For reference,
here's the firmware bundled with ChromeOS (and that I'm running):

https://chromium.googlesource.com/chromiumos/third_party/linux-firmware/+/HEAD/i915/

We tend to pull pieces from upstream linux-firwmare.git as needed, and
seemingly ChromeOS folks haven't found HuC necessary for GLK.

> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index fbc8bae14f76..e3bbd174889d 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -300,13 +300,12 @@ int intel_huc_init(struct intel_huc *huc)
> 
>  void intel_huc_fini(struct intel_huc *huc)
>  {
> -   if (!intel_uc_fw_is_loadable(&huc->fw))
> -   return;
> -
>     delayed_huc_load_complete(huc);
> 
>     i915_sw_fence_fini(&huc->delayed_load.fence);
> -   intel_uc_fw_fini(&huc->fw);
> +
> +   if (intel_uc_fw_is_loadable(&huc->fw))
> +   intel_uc_fw_fini(&huc->fw);
>  }

(NB: you have some very weird whitespace in there. It's neither tabs nor
spaces. This slightly increases the chance that I get your diff wrong,
since the patch doesn't apply directly. But I'm pretty sure I
hand-copied it correctly...)

Unfortunately, I still see the same(?) problem with this patch.

[   85.182000] [ cut here ]
[   85.182014] ODEBUG: init destroyed (active state 0) object type: 
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x11 [i915]
[   85.182238] WARNING: CPU: 2 PID: 1925 at lib/debugobjects.c:505 
debug_print_object+0x6b/0x7e
[   85.182257] Modules linked in: i915(+) cmac algif_hash algif_skcipher af_alg 
btusb uvcvideo btrtl videobuf2_vmalloc btintel videobuf2_v4l2 btmtk 
videobuf2_memops videobuf2_common btbcm soundwire_intel 
soundwire_generic_allocation soundwire_cadence soundwire_bus 8021q bluetooth 
ecdh_generic ecc rtw88_8822ce rtw88_8822c rtw88_pci rtw88_core mac80211 
cfg80211 r8152 mii video wmi backlight drm_buddy intel_gtt drm_display_helper 
ttm prime_numbers joydev [last unloaded: i915]
[   85.182593] CPU: 2 PID: 1925 Comm: i915_module_loa Not tainted 
6.1.0-rc3-01115-ga397a9098fb3-dirty #35 b6325f6cdf3c04a0862a445aa86b1799d3939949
[   85.182607] Hardware name: HP Meep/Meep, BIOS Google_Meep.11297.262.0 
03/18/2021
[   85.182619] RIP: 0010:debug_print_object+0x6b/0x7e
[   85.182634] Code: 31 c9 ff c0 89 05 ae a4 67 01 8b 43 10 8b 4b 14 48 8b 14 
c5 e0 4a 86 8d 4d 8b 07 48 c7 c7 fe 47 ac 8d 4c 89 f6 e8 ba 53 c2 ff <0f> 0b ff 
05 2a 50 11 01 5b 41 5e 41 5f 5d c3 cc cc cc cc 55 48 89
[   85.182646] RSP: 0018:ad7280583638 EFLAGS: 00010246
[   85.182661] RAX: 229fb1a4f3034f00 RBX: 960980064348 RCX: 0027
[   85.182672] RDX: 0027 RSI: dfff RDI: 960af7d1b440
[   85.182683] RBP: ad7280583650 R08:  R09: ad7280583490
[   85.182693] R10: dfff R11: 8ca46e5b R12: 960988382cf8
[   85.182703] R13: c08656b0 R14: 8daffd1f R15: c08656b0
[   85.182713] FS:  7fd5f9306940() GS:960af7d0() 
knlGS:
[   85.182725] CS:  0010 DS:  ES:  CR0: 80050033
[   85.182735] CR2: 7fd5f85ef000 CR3: 0001061d2000 CR4: 00350ee0
[   85.182746] Call Trace:
[   85.182759]  
[   85.182775]  __debug_object_init+0x26c/0x5ea
[   85.182794]  ? intel_huc_init_early+0xa6/0xa6 [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.182996]  ? 0xc03e6083
[   85.183028]  ? prepare_ftrace_return+0xa2/0xdf
[   85.183059]  ? __init_waitqueue_head+0x5/0x21
[   85.183082]  i915_sw_fence_reinit+0x19/0x3d [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.183310]  intel_huc_init_early+0x72/0xa6 [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.183514]  intel_uc_init_early+0x76/0x25b [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.183697]  intel_gt_common_init_early+0xc3/0xd6 [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.183878]  intel_root_gt_init_early+0x4c/0x5c [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.184055]  i915_driver_probe+0x26b/0xbf9 [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.184233]  ? drm_privacy_screen_put+0x5/0x23
[   85.184260]  i915_pci_probe+0x182/0x266 [i915 
30ae04bc806a1fe406030ed4bf98e870eb8aa3bf]
[   85.184468]  pci_device_probe+0x99/0x126
...

That only required:

  pkill frecon
  
  ./i915_module_load --run-subtest reload

Brian


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-04 Thread Brian Norris
On Fri, Nov 04, 2022 at 02:38:03PM +, Matthew Auld wrote:
> On Thu, 3 Nov 2022 at 00:14, Brian Norris  wrote:
> > I'm still curious about the reported failures, but maybe they require
> > some particular sequence of tests? I also don't have the full
> > igt-gpu-tools set running, so maybe they do something a little
> > differently than my steps in [1]?
> >
> > Brian
> >
> > [1] I have a GLk system, if it matters. I figured I can run some of
> > these with any one of the following:
> >
> >   modprobe i915 live_selftests=1
> >   modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
> >   modprobe i915 live_selftests=1 igt__19__live_uncore=Y
> >   modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
> >   ...
> 
> CI should be using the IGT wrapper to run them, AFAIK. So something like:
> 
> ./build/tests/i915_selftest
> 
> Or to just run the live, mock or perf:
> 
> ./build/tests/i915_selftest --run-subtest live
> ./build/tests/i915_selftest --run-subtest mock
> ./build/tests/i915_selftest --run-subtest perf
> 
> Or if you want to run some particular selftest, like live mman tests:
> 
> ./build/tests/i915_selftest --run-subtest live --dyn mman

Thanks. I'm running through those now, and it seems like I'm doing
closer to what the CI logs show [1], but I'm still not reproducing on my
GLK. (I've now managed to run it with drm-tip; still no luck.)

So far, now I've managed to just reproduced *different* known problems:

https://lore.kernel.org/all/y2wfplbx1sedt...@google.com/

But after working around those, I run without any similar lockup
failures.

I might poke around some more next week, but I've probably spent more
time than reasonable on this already.

Anyway, thanks for the help!

Regards,
Brian

[1] For one, I've run through a test list, in order, based on this:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/fi-glk-j4005/testlist0.txt


Re: [Intel-gfx] [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

2022-11-04 Thread Brian Norris
Hi,

On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
> Don't know if this is real or not yet, hit it while running selftests a bit. 
> Something to keep an eye on.
> 
> [ 2928.370577] ODEBUG: init destroyed (active state 0) object type: 
> i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
> [ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 
> debug_print_object+0x6b/0x90
> [ 2928.370984] Modules linked in: i915(+) drm_display_helper drm_kms_helper 
> netconsole cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 
> snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio 
> snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm 
> intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_seq_midi 
> snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel btmtk 
> btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer rapl 
> intel_cstate snd_seq_device input_leds mac80211 ecdh_generic libarc4 ath snd 
> ecc serio_raw intel_wmi_thunderbolt at24 soundcore cfg80211 mei_me 
> intel_xhci_usb_role_switch mei ideapad_laptop intel_pch_thermal 
> platform_profile sparse_keymap acpi_pad sch_fq_codel msr efi_pstore ip_tables 
> x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
> sha512_ssse3 aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd 
> vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci syscopyarea 
> ahci
> [ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci fb_sys_fops 
> video wmi [last unloaded: drm_kms_helper]
> [ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  W  
> 6.1.0-rc1 #196
> [ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 
> 12/01/2015
> [ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
> [ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 8b 4b 14 
> 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 5b 60 00 <0f> 0b 
> 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 5a 3e 01
> [ 2928.371782] RSP: 0018:9ed841607a18 EFLAGS: 00010286
> [ 2928.371841] RAX:  RBX: 9208116a1d48 RCX: 
> 
> [ 2928.371909] RDX: 0001 RSI: bbd277d2 RDI: 
> 
> [ 2928.372024] RBP: c176a540 R08:  R09: 
> bc07a1e0
> [ 2928.372128] R10: 0001 R11: 0001 R12: 
> 9208122da830
> [ 2928.372192] R13: 92080089b000 R14: 9208122da770 R15: 
> 
> [ 2928.372259] FS:  7f53e7617c40() GS:92086e50() 
> knlGS:
> [ 2928.372365] CS:  0010 DS:  ES:  CR0: 80050033
> [ 2928.372425] CR2: 55cd28b33070 CR3: 000110dbd006 CR4: 
> 003706e0
> [ 2928.372526] Call Trace:
> [ 2928.372568]  
> [ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
> [ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
> [ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
> [ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
> [ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
> [ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
> [ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]
...

Did you track this down? Or consider reverting? This is tripping me up
on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.

Brian


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-02 Thread Brian Norris
On Wed, Nov 02, 2022 at 12:18:37PM +, Matthew Auld wrote:
> On Tue, 1 Nov 2022 at 21:58, Brian Norris  wrote:
> >
> > On Fri, Oct 28, 2022 at 5:24 PM Patchwork
> >  wrote:
> > >
> > > Patch Details
> > > Series:drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
> > > URL:https://patchwork.freedesktop.org/series/110277/
> > > State:failure
> > > Details:https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> > >
> > > CI Bug Log - changes from CI_DRM_12317 -> Patchwork_110277v1
> > >
> > > Summary
> > >
> > > FAILURE
> > >
> > > Serious unknown changes coming with Patchwork_110277v1 absolutely need to 
> > > be
> > > verified manually.
> > >
> > > If you think the reported changes have nothing to do with the changes
> > > introduced in Patchwork_110277v1, please notify your bug team to allow 
> > > them
> > > to document this new failure mode, which will reduce false positives in 
> > > CI.
> > >
> > > External URL: 
> > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> >
> > For the record, I have almost zero idea what to do with this. From
> > what I can tell, most (all?) of these failures are flaky(?) already
> > and are probably not related to my change.
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
> 
> According to that link, this change appears to break every platform
> when running the live selftests (looking at the purple squares).
> Running the selftests normally involves loading and unloading the
> module. Looking at the logs there is scary stuff like:
> 
[...]

Ah, thanks. I'm not sure what made me think the tests were failing the
same way on drm-tip, but maybe just chalk that up to my unfamiliarity
with this particular dashboard... (There are a few isolated failure
and/or flakes on drm-tip, but they don't look like this.)

Anyway, I think I managed to run some of these tests on my own platforms
[1], and I don't reproduce those failures. I do see other failures
(crashes) though, like in i915_gem_mman_live_selftests/igt_mmap, where
igt_mmap_offset() (selftest-only code) -> vm_mmap() assumes we have a
valid |current->mm|. But that's borrowing the modprobe process's memory
map, and with async probe, the selftest sequence happens in a kernel
worker instead (and current->mm is NULL). So that clearly won't work.

I suppose I could disable async probe when built as a module (I believe
it doesn't really have any value, since the module load task just waits
for the async task anyway). I'm not familiar enough with MM to know what
the vm_mmap() alternatives are, but this particular bit of code does
feel odd.

Additionally, I think this implies that live_selftests will break if
i915 is built-in (i.e., =y, not =m), as we'll again run in a
kernel-thread context at boot time. But I would hope nobody is trying to
run them that way? I guess this gets even hairier, because even if the
driver is built into the kernel, it's possible to kick them off from a
process context by tweaking the module parameters later, and then
re-binding the device... So all in all, this bug leaves an ugly
situation, with or without my patch.

I'm still curious about the reported failures, but maybe they require
some particular sequence of tests? I also don't have the full
igt-gpu-tools set running, so maybe they do something a little
differently than my steps in [1]?

Brian

[1] I have a GLk system, if it matters. I figured I can run some of
these with any one of the following:

  modprobe i915 live_selftests=1
  modprobe i915 live_selftests=1 igt__20__live_workarounds=Y
  modprobe i915 live_selftests=1 igt__19__live_uncore=Y
  modprobe i915 live_selftests=1 igt__18__live_sanitycheck=Y
  ...


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-11-01 Thread Brian Norris
On Fri, Oct 28, 2022 at 5:24 PM Patchwork
 wrote:
>
> Patch Details
> Series:drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
> URL:https://patchwork.freedesktop.org/series/110277/
> State:failure
> Details:https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html
>
> CI Bug Log - changes from CI_DRM_12317 -> Patchwork_110277v1
>
> Summary
>
> FAILURE
>
> Serious unknown changes coming with Patchwork_110277v1 absolutely need to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_110277v1, please notify your bug team to allow them
> to document this new failure mode, which will reduce false positives in CI.
>
> External URL: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_110277v1/index.html

For the record, I have almost zero idea what to do with this. From
what I can tell, most (all?) of these failures are flaky(?) already
and are probably not related to my change.

But if someone believes to actually be a blocking issue with my patch,
let me know, and I can see if I can figure out a better answer than
that.

Thanks,
Brian


Re: [Intel-gfx] [RFC] i915: make the probe asynchronous

2022-10-28 Thread Brian Norris
Hi,

On Thu, Aug 16, 2018 at 03:40:38PM +0800, Feng Tang wrote:
> On Tue, Aug 14, 2018 at 11:39:48AM +0200, Takashi Iwai wrote:
> > FYI, the upcoming 4.19 will have the completion in audio side binding,
> > so this problem should be solved there.
> 
> Really a great news! thanks for sharing

For the record: that was merged as:

  f9b54e1961c7 ("ALSA: hda/i915: Allow delayed i915 audio component binding")

I'm also poking here in case somebody still had reason we shouldn't do
this now. I wrote up my own patch, and the looked for past discussions
like this one. Feel free to comment here if there's still a problem:

  [PATCH] drm/i915: Set PROBE_PREFER_ASYNCHRONOUS
  
https://lore.kernel.org/lkml/20221028145319.1.I87b119c576d486ad139faf1b7278d97e158aabe4@changeid/

Thanks,
Brian


[Intel-gfx] [PATCH] drm/i915: Set PROBE_PREFER_ASYNCHRONOUS

2022-10-28 Thread Brian Norris
This driver often takes a good 100ms to start, but in some particularly
bad cases takes more than 1 second.

In surveying risk for this driver, I poked around for cross-device
shared state, which can be a source of race conditions. GVT support
(intel_gvt_devices) seems potentially suspect, but it has an appropriate
mutex, and doesn't seem to care about ordering -- if devices are present
when the kvmgt module loads, they'll get picked up; and if they probe
later than kvmgt, they'll attach then.

Additionally, I see past discussions about this patch [1], which
concluded that there were problems at the time due to the way
hdac_i915.c called request_module("i915") and expected it to complete
probing [2]. Work has since been merged [3] to fix that problem.

This driver was pinpointed as part of a survey of drivers that take more
than 100ms in their initcall (i.e., are built in, and probing
synchronously) on a lab of ChromeOS systems.

[1] [RFC] i915: make the probe asynchronous
https://lore.kernel.org/all/20180604053219.2040-1-feng.t...@intel.com/

[2] https://lore.kernel.org/all/s5hin4d1e3f.wl-ti...@suse.de/

[3] Commit f9b54e1961c7 ("ALSA: hda/i915: Allow delayed i915 audio
component binding")

Signed-off-by: Brian Norris 
---

 drivers/gpu/drm/i915/i915_pci.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 38460a0bd7cb..1cb1f87aea86 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1371,7 +1371,10 @@ static struct pci_driver i915_pci_driver = {
.probe = i915_pci_probe,
.remove = i915_pci_remove,
.shutdown = i915_pci_shutdown,
-   .driver.pm = &i915_pm_ops,
+   .driver = {
+   .pm = &i915_pm_ops,
+   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+   },
 };
 
 int i915_pci_register_driver(void)
-- 
2.38.1.273.g43a17bfeac-goog



Re: [Intel-gfx] [RFC PATCH] drm/i915: PSR regressions on Broadwell

2015-09-28 Thread Brian Norris
On Mon, Sep 28, 2015 at 10:14:04AM +0300, Jani Nikula wrote:
> On Sat, 26 Sep 2015, Brian Norris  wrote:
> > When using PSR, I see the screen freeze after only a few frames (sometimes a
> > split second; sometimes it seems like practically the first frame). 
> > Bisecting
> > led me to commit 3301d4092106 ("drm/i915: PSR: Fix DP_PSR_NO_TRAIN_ON_EXIT
> > logic") in v4.2. This patch is the simplest fix that gets it working again 
> > for
> > me, but it's probably wrong.
> >
> > Random thought: perhaps my panel's DPCD is programmed incorrectly?
> >
> > Anyway, any tips on fixing this properly?
> 
> Here's a thought:
> 
> diff --git a/drivers/gpu/drm/i915/intel_psr.c 
> b/drivers/gpu/drm/i915/intel_psr.c
> index a04b4dc5ed9b..3a911d4a2308 100644
> --- a/drivers/gpu/drm/i915/intel_psr.c
> +++ b/drivers/gpu/drm/i915/intel_psr.c
> @@ -274,6 +274,8 @@ static void hsw_psr_enable_source(struct intel_dp 
> *intel_dp)
>   idle_frames += 4;
>   }
>  
> + idle_frames = clamp(idle_frames, 0, 15);
> +
>   I915_WRITE(EDP_PSR_CTL(dev), val |
>  (IS_BROADWELL(dev) ? 0 : link_entry_time) |
>  max_sleep_time << EDP_PSR_MAX_SLEEP_TIME_SHIFT |

No dice. The VBT actually gives 0, so I just end up with the default 5 +
4 either way.

> We do clamp the VBT value to range 0..15, but then go on to add to it.

But this patch seems quite reasonable anyway.

Acked-by: Brian Norris 

> Otherwise, up to Rodrigo I guess.

Brian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [RFC PATCH] drm/i915: PSR regressions on Broadwell

2015-09-25 Thread Brian Norris
When using PSR, I see the screen freeze after only a few frames (sometimes a
split second; sometimes it seems like practically the first frame). Bisecting
led me to commit 3301d4092106 ("drm/i915: PSR: Fix DP_PSR_NO_TRAIN_ON_EXIT
logic") in v4.2. This patch is the simplest fix that gets it working again for
me, but it's probably wrong.

Random thought: perhaps my panel's DPCD is programmed incorrectly?

Anyway, any tips on fixing this properly?

Seen on Chromebook Pixel 2.

Also required this patch to get PSR properly running on 4.3-rc2:

https://patchwork.freedesktop.org/patch/57698/

Signed-off-by: Brian Norris 
---
 drivers/gpu/drm/i915/intel_psr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_psr.c b/drivers/gpu/drm/i915/intel_psr.c
index 7e335a8546f6..4cd33b76b8a6 100644
--- a/drivers/gpu/drm/i915/intel_psr.c
+++ b/drivers/gpu/drm/i915/intel_psr.c
@@ -261,7 +261,8 @@ static void hsw_psr_enable_source(struct intel_dp *intel_dp)
uint32_t val = 0x0;
const uint32_t link_entry_time = EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES;
 
-   if (intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT) {
+   if ((intel_dp->psr_dpcd[1] & DP_PSR_NO_TRAIN_ON_EXIT) &&
+   !IS_BROADWELL(dev)) {
/* It doesn't mean we shouldn't send TPS patters, so let's
   send the minimal TP1 possible and skip TP2. */
val |= EDP_PSR_TP1_TIME_100us;
-- 
2.6.0.rc2.230.g3dd15c0

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [4/7] drm/i915: PSR: Mask LPSP hw tracking back again.

2015-09-25 Thread Brian Norris
On Thu, Aug 20, 2015 at 05:55:41PM -0700, Rodrigo Vivi wrote:
> At the beginning it was masked to allow PSR at all.
> Than it got removed later by my
> commit 09108b90f040 ("drm/i915: PSR: Remove Low Power HW tracking mask.")
> in order to trying fixing one case reported at intel-gfx mailing list
> where we were missing screen updates when runtime_pm was enabled.
> 
> However I verified that other patch that makes flush to force
> invalidate also fixes this issue by itself.
> commit 169de1316c1e ("drm/i915: PSR: Flush means invalidate + flush")
> 
> Mainly now that we are relying more on frontbuffer tracking it is a
> good idea to mask this hw tracking again.
> 
> But besides all this above it is important to hightligh that with LPSP
> unmasked we started seeing some screen freezings as reported at fd.o.
> 
> This patch fixes the unrecoverable frozen screens reported:
> 
> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=91436
> Reference: https://bugs.freedesktop.org/show_bug.cgi?id=91437
> 
> Cc: Ivan Mitev 
> Signed-off-by: Rodrigo Vivi 

FWIW:

Tested-by: Brian Norris 

After patching up another regression locally, I bisected down to commit
09108b90f040 ("drm/i915: PSR: Remove Low Power HW tracking mask.") when
searching for why I could rarely get into PSR. I guess that's intended
behavior from the commit message?

WARNING: With this patch if snd_intel_hda driver is
running and not releasing power well properly PSR will
constant Exit and Performance Counter will be 0.

Anyway, I guess I'm saying I'm interested in whether there is a sane
path to getting PSR going again (without disabling sound!).

Regards,
Brian
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [1/7] drm/i915: Remove duplicated dpcd write on hsw_psr_enable_sink.

2015-09-25 Thread Brian Norris
On Thu, Aug 20, 2015 at 05:55:38PM -0700, Rodrigo Vivi wrote:
> This is wrong since my commit (89251b17). The intention of that
> commit was to remove this one here that is also wrong anyway,
> but it was forgotten.
> 
> Signed-off-by: Rodrigo Vivi 

Tested-by: Brian Norris 

(caveat below)

I was just debugging some PSR issues on Broadwell and ran across your
commit 89251b17. I was also left wondering:
(1) why the duplicate DPCD write to DP_PSR_EN_CFG and
(2) why the '& ~DP_PSR_MAIN_LINK_ACTIVE' mask

Anyway, I was going to send a similar patch soon :)

*** Testing caveat: ***

PSR is still broken on my Broadwell system as of commit 3301d4092106
("drm/i915: PSR: Fix DP_PSR_NO_TRAIN_ON_EXIT logic"). I've done a
partial revert of this commit (but made sure I avoid link standby, as
commit 89251b177b58 ("drm/i915: PSR: deprecate link_standby support for
core platforms.") rightly argues).

But I'll send a proper mail for this issue separately, rather than
carrying on about it here on an unrelated patch.

*** End caveat ***

Regards,
Brian

> ---
> drivers/gpu/drm/i915/intel_psr.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_psr.c 
> b/drivers/gpu/drm/i915/intel_psr.c
> index a04b4dc..51f0514 100644
> --- a/drivers/gpu/drm/i915/intel_psr.c
> +++ b/drivers/gpu/drm/i915/intel_psr.c
> @@ -170,9 +170,6 @@ static void hsw_psr_enable_sink(struct intel_dp *intel_dp)
>  
>   aux_clock_divider = intel_dp->get_aux_clock_divider(intel_dp, 0);
>  
> - drm_dp_dpcd_writeb(&intel_dp->aux, DP_PSR_EN_CFG,
> -DP_PSR_ENABLE & ~DP_PSR_MAIN_LINK_ACTIVE);
> -
>   /* Enable AUX frame sync at sink */
>   if (dev_priv->psr.aux_frame_sync)
>   drm_dp_dpcd_writeb(&intel_dp->aux,
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx