Re: [Intel-wired-lan] REGRESSION: 86167183a17e ("igc: fix a log entry using uninitialized netdev")

2024-05-30 Thread Thorsten Leemhuis
On 27.05.24 15:50, Jani Nikula wrote:
> 
> Hi all, the Intel graphics CI hits a lockdep issue with commit
> 86167183a17e ("igc: fix a log entry using uninitialized netdev") in
> v6.10-rc1.

FWIW, there is a earlier regression report bisected to that commit id:
https://lore.kernel.org/lkml/cabxgcsokigxafa9tpkjyx7wqjbzqxqk2pztcw-rglfgo8g7...@mail.gmail.com/

And a revert is up for review:
https://lore.kernel.org/all/20240529051307.3094901-1-sasha.nef...@intel.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-21 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Hmm, from here it looks like the patch now that it was reviewed more
that a week ago is still not even in -next. Is there a reason?

I know, we are in the merge window. But at the same time this is a fix
(that already lingered on the lists for way too long before it was
reviewed) for a regression in a somewhat recent kernel, so it in Linus
own words should be "expedited"[1].

Or are we again just missing a right person for the job in the CC?
Adding Dave and Sima just in case.

Ciao, Thorsten

[1]
https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/

On 12.05.24 18:11, Limonciello, Mario wrote:
> On 5/10/2024 4:24 AM, Jani Nikula wrote:
>> On Fri, 10 May 2024, "Lin, Wayne"  wrote:
>>>> -Original Message-
>>>> From: Limonciello, Mario 
>>>> Sent: Friday, May 10, 2024 3:18 AM
>>>> To: Linux regressions mailing list ;
>>>> Wentland, Harry
>>>> ; Lin, Wayne 
>>>> Cc: ly...@redhat.com; imre.d...@intel.com; Leon Weiß
>>>> >>> bochum.de>; sta...@vger.kernel.org; dri-de...@lists.freedesktop.org;
>>>> amd-
>>>> g...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/mst: Fix NULL pointer dereference at
>>>> drm_dp_add_payload_part2
>>>>
>>>> On 5/9/2024 07:43, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 18.04.24 21:43, Harry Wentland wrote:
>>>>>> On 2024-03-07 01:29, Wayne Lin wrote:
>>>>>>> [Why]
>>>>>>> Commit:
>>>>>>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement") accidently overwrite the commit
>>>>>>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in
>>>>>>> drm_dp_add_payload_part2") which cause regression.
>>>>>>>
>>>>>>> [How]
>>>>>>> Recover the original NULL fix and remove the unnecessary input
>>>>>>> parameter 'state' for drm_dp_add_payload_part2().
>>>>>>>
>>>>>>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement")
>>>>>>> Reported-by: Leon Weiß 
>>>>>>> Link:
>>>>>>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.c
>>>>>>> a...@ruhr-uni-bochum.de/
>>>>>>> Cc: ly...@redhat.com
>>>>>>> Cc: imre.d...@intel.com
>>>>>>> Cc: sta...@vger.kernel.org
>>>>>>> Cc: regressi...@lists.linux.dev
>>>>>>> Signed-off-by: Wayne Lin 
>>>>>>
>>>>>> I haven't been deep in MST code in a while but this all looks pretty
>>>>>> straightforward and good.
>>>>>>
>>>>>> Reviewed-by: Harry Wentland 
>>>>>
>>>>> Hmmm, that was three weeks ago, but it seems since then nothing
>>>>> happened to fix the linked regression through this or some other
>>>>> patch. Is there a reason? The build failure report from the CI maybe?
>>>>
>>>> It touches files outside of amd but only has an ack from AMD.  I
>>>> think we
>>>> /probably/ want an ack from i915 and nouveau to take it through.
>>>
>>> Thanks, Mario!
>>>
>>> Hi Thorsten,
>>> Yeah, like what Mario said. Would also like to have ack from i915 and
>>> nouveau.
>>
>> It usually works better if you Cc the folks you want an ack from! ;)
>>
>> Acked-by: Jani Nikula 
>>
> 
> Thanks! Can someone with commit permissions take this to drm-misc?
> 
> 
> 


Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-09 Thread Linux regression tracking (Thorsten Leemhuis)
On 18.04.24 21:43, Harry Wentland wrote:
> On 2024-03-07 01:29, Wayne Lin wrote:
>> [Why]
>> Commit:
>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> accidently overwrite the commit
>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in 
>> drm_dp_add_payload_part2")
>> which cause regression.
>>
>> [How]
>> Recover the original NULL fix and remove the unnecessary input parameter 
>> 'state' for
>> drm_dp_add_payload_part2().
>>
>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> Reported-by: Leon Weiß 
>> Link: 
>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.ca...@ruhr-uni-bochum.de/
>> Cc: ly...@redhat.com
>> Cc: imre.d...@intel.com
>> Cc: sta...@vger.kernel.org
>> Cc: regressi...@lists.linux.dev
>> Signed-off-by: Wayne Lin 
> 
> I haven't been deep in MST code in a while but this all looks
> pretty straightforward and good.
> 
> Reviewed-by: Harry Wentland 

Hmmm, that was three weeks ago, but it seems since then nothing happened
to fix the linked regression through this or some other patch. Is there
a reason? The build failure report from the CI maybe?

Wayne Lin, do you know what's up?

Ciao, Thorsten

>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +-
>>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +---
>>  drivers/gpu/drm/i915/display/intel_dp_mst.c   | 2 +-
>>  drivers/gpu/drm/nouveau/dispnv50/disp.c   | 2 +-
>>  include/drm/display/drm_dp_mst_helper.h   | 1 -
>>  5 files changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> index c27063305a13..2c36f3d00ca2 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> @@ -363,7 +363,7 @@ void dm_helpers_dp_mst_send_payload_allocation(
>>  mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
>>  new_payload = drm_atomic_get_mst_payload_state(mst_state, 
>> aconnector->mst_output_port);
>>  
>> -ret = drm_dp_add_payload_part2(mst_mgr, mst_state->base.state, 
>> new_payload);
>> +ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
>>  
>>  if (ret) {
>>  amdgpu_dm_set_mst_status(&aconnector->mst_status,
>> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
>> b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> index 03d528209426..95fd18f24e94 100644
>> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> @@ -3421,7 +3421,6 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>  /**
>>   * drm_dp_add_payload_part2() - Execute payload update part 2
>>   * @mgr: Manager to use.
>> - * @state: The global atomic state
>>   * @payload: The payload to update
>>   *
>>   * If @payload was successfully assigned a starting time slot by 
>> drm_dp_add_payload_part1(), this
>> @@ -3430,14 +3429,13 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>   * Returns: 0 on success, negative error code on failure.
>>   */
>>  int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
>> - struct drm_atomic_state *state,
>>   struct drm_dp_mst_atomic_payload *payload)
>>  {
>>  int ret = 0;
>>  
>>  /* Skip failed payloads */
>>  if (payload->payload_allocation_status != 
>> DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
>> -drm_dbg_kms(state->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>> +drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>>  payload->port->connector->name);
>>  return -EIO;
>>  }
>> diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
>> b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> index 53aec023ce92..2fba66aec038 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> @@ -1160,7 +1160,7 @@ static void intel_mst_enable_dp(struct 
>> intel_atomic_state *state,
>>  if (first_mst_stream)
>>  intel_ddi_wait_for_fec_status(encoder, pipe_config, true);
>>  
>> -drm_dp_add_payload_part2(&intel_dp->mst_mgr, &state->base,
>> +drm_dp_add_payload_part2(&intel_dp->mst_mgr,
>>   drm_atomic_get_mst_payload_state(mst_state, 
>> connector->port));
>>  
>>  if (DISPLAY_VER(dev_priv) >= 12)
>> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
>> b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> index 0c3d88ad0b0e..88728a0b2c25 100644
>> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> @@ -915,7 +915,7 @@ nv50_msto_cleanup(struct drm_atomic_state *state,
>>  msto->disabled = false;
>>

Re: [REGRESSION] external monitor+Dell dock in 6.8

2024-04-02 Thread Linux regression tracking (Thorsten Leemhuis)
[Adding a few folks and list while dropping the stable list, as this is
unrelated to it]

On 31.03.24 07:59, Andrei Gaponenko wrote:
> 
> I noticed a regression with the mailine kernel pre-compiled by EPEL.
> I have just tried linux-6.9-rc1.tar.gz from kernel.org, and it still
> misbehaves.
> 
> The default setup: a laptop is connected to a dock, Dell WD22TB4, via
> a USB-C cable.  The dock is connected to an external monitor via a
> Display Port cable.  With a "good" kernel everything works.  With a
> "broken" kernel, the external monitor is still correctly identified by
> the system, and is shown as enabled in plasma systemsettings. The
> system also behaves like the monitor is working, for example, one can
> move the mouse pointer off the laptop screen.  However the external
> monitor screen stays black, and it eventually goes to sleep.

Just a quick heads up to ensure people are aware of it:

Imre Deak, turns out this is caused by a patch of yours: 55eaef16417448
("drm/i915/dp_mst: Handle the Synaptics HBlank expansion quirk"). Andrei
Gaponenko meanwhile filed a ticket about it here:

https://gitlab.freedesktop.org/drm/intel/-/issues/10637

Ciao, Thorsten

> Everything worked with EPEL mainline kernels up to and including
> kernel-ml-6.7.9-1.el9.elrepo.x86_64
> 
> The breakage is observed in
> 
> kernel-ml-6.8.1-1.el9.elrepo.x86_64
> kernel-ml-6.8.2-1.el9.elrepo.x86_64
> linux-6.9-rc1.tar.gz from kernel.org (with olddefconfig)
> 
> Other tests: using an HDMI cable instead of the Display Port cable
> between the monitor and the dock does not change things, black screen
> with the newer kernels.
> 
> Using a small HDMI-to-USB-C adapter instead of the dock results in a
> working system, even with the newer kernels.  So the breakage appears
> to be specific to the Dell WD22TB4 dock.
> 
> Operating System: AlmaLinux 9.3 (Shamrock Pampas Cat)
> 
> uname -mi: x86_64 x86_64
> 
> Laptop: Dell Precision 5470/02RK6V
> 
> lsusb |grep dock
> Bus 003 Device 007: ID 413c:b06e Dell Computer Corp. Dell dock
> Bus 003 Device 008: ID 413c:b06f Dell Computer Corp. Dell dock
> Bus 003 Device 006: ID 0bda:5413 Realtek Semiconductor Corp. Dell dock
> Bus 003 Device 005: ID 0bda:5487 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 004: ID 0bda:0413 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 003: ID 0bda:0487 Realtek Semiconductor Corp. Dell dock
> 
> dmesg and kernel config are attached to 
> https://bugzilla.kernel.org/show_bug.cgi?id=218663
> 
> #regzbot introduced: v6.7.9..v6.8.1

P.S.:

#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218663
#regzbot duplicate: https://gitlab.freedesktop.org/drm/intel/-/issues/10637
#regzbot title: drm/i915/dp_mst: external monitor on Dell dock broke


Re: [PATCH] Fix divide-by-zero on DP unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)
On 11.03.24 17:09, Imre Deak wrote:
> On Sat, Feb 10, 2024 at 09:24:59PM +, Chris Bainbridge wrote:
> Sorry for the delay.

Happens, thx for looking onto this!

>> The following trace occurs when using nouveau and unplugging a DP MST
>> adaptor:
> [...] 
>> +if (bpp_x16 == 0)
>> +return 0;
> 
> Could you please move the check to the beginnig of the function and add
> a debug message in case bpp_x16 is 0?
> 
> It looks odd that a driver calls this function with a 0 bpp_x16, and
> ideally it should be fixed in the driver. However as it's a regression
> and we don't have a better idea now:
> 
> Acked-by: Imre Deak 

Chris: as this went into 6.8, please consider adding a stable-tag to
ensure Greg picks this up.

Ciao, Thorsten



Re: [REGRESSION] Divide-by-zero on DisplayPort MST unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)
On 07.03.24 18:58, Chris Bainbridge wrote:
> - Forwarded message from Chris Bainbridge  
> -
> 
> Date: Sat, 10 Feb 2024 21:24:59 +

Hmm, it looks like nobody is looking into this regression. Is there a
good reason?

Imre, or did you maybe just miss that Chris' regression seems to be
caused by a commit of yours? He initally proposed a fix (the forwarded
mail that is quoted here) more a month ago already here:
https://lore.kernel.org/all/ZcfpqwnkSoiJxeT9@debian.local/

Chris recently filed a ticket, too:
https://gitlab.freedesktop.org/drm/misc/kernel/-/issues/36

Mostly silence there as well. :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S: Chris, sorry, I had missed that you initially proposed the fix a
month ago; if I had noticed this earlier I had sent a mail like this one
earlier.
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> From: Chris Bainbridge 
> To: dri-de...@lists.freedesktop.org
> Cc: ly...@redhat.com, ville.syrj...@linux.intel.com, 
> stanislav.lisovs...@intel.com,
>   mrip...@kernel.org, imre.d...@intel.com
> Subject: [PATCH] Fix divide-by-zero on DP unplug with nouveau
> 
> The following trace occurs when using nouveau and unplugging a DP MST
> adaptor:
>>  divide error:  [#1] PREEMPT SMP PTI
>  CPU: 7 PID: 2962 Comm: Xorg Not tainted 6.8.0-rc3+ #744
>  Hardware name: Razer Blade/DANA_MB, BIOS 01.01 08/31/2018
>  RIP: 0010:drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>  Code: c6 b8 01 00 00 00 75 61 01 c6 41 0f af f3 41 0f af f1 c1 e1 04 48 63 
> c7 31 d2 89 ff 48 8b 5d f8 c9 48 0f af f1 48 8d 44 06 ff <48> f7 f7 31 d2 31 
> c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31
>  RSP: 0018:b2c5c211fa30 EFLAGS: 00010206
>  RAX:  RBX:  RCX: 00f59b00
>  RDX:  RSI:  RDI: 
>  RBP: b2c5c211fa48 R08: 0001 R09: 0020
>  R10: 0004 R11:  R12: 00023b4a
>  R13: 91d37d165800 R14: 91d36fac6d80 R15: 91d34a764010
>  FS:  7f4a1ca3fa80() GS:91d6edbc() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 559491d49000 CR3: 00011d180002 CR4: 003706f0
>  Call Trace:
>   
>   ? show_regs+0x6d/0x80
>   ? die+0x37/0xa0
>   ? do_trap+0xd4/0xf0
>   ? do_error_trap+0x71/0xb0
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? exc_divide_error+0x3a/0x70
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? asm_exc_divide_error+0x1b/0x20
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? drm_dp_calc_pbn_mode+0x2e/0x70 [drm_display_helper]
>   nv50_msto_atomic_check+0xda/0x120 [nouveau]
>   drm_atomic_helper_check_modeset+0xa87/0xdf0 [drm_kms_helper]
>   drm_atomic_helper_check+0x19/0xa0 [drm_kms_helper]
>   nv50_disp_atomic_check+0x13f/0x2f0 [nouveau]
>   drm_atomic_check_only+0x668/0xb20 [drm]
>   ? drm_connector_list_iter_next+0x86/0xc0 [drm]
>   drm_atomic_commit+0x58/0xd0 [drm]
>   ? __pfx___drm_printfn_info+0x10/0x10 [drm]
>   drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
>   drm_mode_obj_set_property_ioctl+0x1c5/0x450 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   drm_connector_property_set_ioctl+0x3b/0x60 [drm]
>   drm_ioctl_kernel+0xb9/0x120 [drm]
>   drm_ioctl+0x2d0/0x550 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   nouveau_drm_ioctl+0x61/0xc0 [nouveau]
>   __x64_sys_ioctl+0xa0/0xf0
>   do_syscall_64+0x76/0x140
>   ? do_syscall_64+0x85/0x140
>   ? do_syscall_64+0x85/0x140
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>  RIP: 0033:0x7f4a1cd1a94f
>  Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 
> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 
> ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
>  RSP: 002b:7ffd2f1df520 EFLAGS: 0246 ORIG_RAX: 0010
>  RAX: ffda RBX: 7ffd2f1df5b0 RCX: 7f4a1cd1a94f
>  RDX: 7ffd2f1df5b0 RSI: c01064ab RDI: 000f
>  RBP: c01064ab R08: 56347932deb8 R09: 56347a7d99c0
>  R10:  R11: 0246 R12: 56347938a220
>  R13: 000f R14: 563479d9f3f0 R15: 
>   
>  Modules linked in: rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat 
> nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user 
> xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp 
> llc ccm cmac algif_hash overlay algif_skcipher af_alg bnep binfmt_misc 
> snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci 
> snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_sof_utils 
> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress 
> snd_sof_intel_hda_mlink sn

Re: [Intel-gfx] [REGRESSION] Panic in gen8_ggtt_insert_entries() with v6.5

2023-09-29 Thread Linux regression tracking #update (Thorsten Leemhuis)
On 19.09.23 16:08, Bagas Sanjaya wrote:
> On Sat, Sep 02, 2023 at 06:14:12PM +0200, Oleksandr Natalenko wrote:
>>
>> Since v6.5 kernel the following HW:
>>
>> * Lenovo T460s laptop with Skylake GT2 [HD Graphics 520] (rev 07)
>> * Lenovo T490s laptop with WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
> 
> #regzbot ^introduced: 0b62af28f249b9
> #regzbot title: gen8_ggtt_insert_entries() panic on Lenovo T14s (Tiger Lake) 
> due to folio_batch() on shmem_sg_free_table()
> #regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256

#regzbot fix: i915: Limit the length of an sg list to the requested length
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.




Re: [Intel-gfx] [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-13 Thread Linux regression tracking (Thorsten Leemhuis)
On 11.08.23 20:10, Mikhail Rudenko wrote:
> On 2023-08-11 at 08:45 +02, Thorsten Leemhuis  
> wrote:
>> On 10.08.23 21:33, Mikhail Rudenko wrote:
>>> The following is a copy an issue I posted to drm/i915 gitlab [1] two
>>> months ago. I repost it to the mailing lists in hope that it will help
>>> the right people pay attention to it.
>>
>> Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
>> Thomas (who committed it) it didn't look into this, but maybe the i915
>> devs didn't forward the report to them.

For the record: they did, and Jani mentioned already. Sorry, should have
phrased this differently.

>> Let's see if these mails help. Just wondering: does reverting
>> a4e771729a51 from 6.5-rc5 or drm-tip help as well?
> 
> I've redone my tests with 6.5-rc5, and here are the results:
> (1) 6.5-rc5 -> still affected
> (2) 6.5-rc5 + revert a4e771729a51 -> not affected
> (3) 6.5-rc5 + two patches [1][2] suggested on i915 gitlab by @ideak -> not 
> affected (!)
> 
> Should we somehow tell regzbot about (3)?

That's good to know, thx. But the more important things are:

* When will those be merged? They are not yet in next yet afaics, so it
might take some time to mainline them, especially at this point of the
devel cycle. Imre, could you try to prod the right people so that these
are ideally upstreamed rather sooner than later, as they fix a regression?
* They if possible ideally should be tagged for backporting to 6.4, as
this is a regression from the 6.3 cycle.

But yes, let's tell regzbot that fixes are available, too:

#regzbot fix: drm/i915: Fix HPD polling, reenabling the output poll work
as needed

(for the record: that's the second of two patches apparently needed)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> BTW, there was an earlier report about a problem with a4e771729a51 that
>> afaics was never addressed, but it might be unrelated.
>> https://lore.kernel.org/all/20230328023129.3596968-1-zhouzong...@kylinos.cn/
> [1] https://patchwork.freedesktop.org/patch/548590/?series=121050&rev=1
> [2] https://patchwork.freedesktop.org/patch/548591/?series=121050&rev=1



Re: [Intel-gfx] [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-11 Thread Thorsten Leemhuis
[CCing the i915 maintainers and the dri maintainers]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 10.08.23 21:33, Mikhail Rudenko wrote:
> The following is a copy an issue I posted to drm/i915 gitlab [1] two
> months ago. I repost it to the mailing lists in hope that it will help
> the right people pay attention to it.

Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
Thomas (who committed it) it didn't look into this, but maybe the i915
devs didn't forward the report to them.

Let's see if these mails help. Just wondering: does reverting
a4e771729a51 from 6.5-rc5 or drm-tip help as well?

BTW, there was an earlier report about a problem with a4e771729a51 that
afaics was never addressed, but it might be unrelated.

https://lore.kernel.org/all/20230328023129.3596968-1-zhouzong...@kylinos.cn/

Ciao, Thorsten

> After kernel upgrade from 6.2.13 to 6.3 HDMI connector detection is
> broken for me. Issue is 100% reproducible:
> 
> 1. Start system as usual with HDMI connected.
> 2. Disconnect HDMI
> 3. Connect HDMI back
> 4. Get "no signal" on display, connector status in sysfs is disconnected
> 
> Curiously, running xrandr over ssh like
> 
> ssh qnap251.local env DISPLAY=:0 xrandr
> 
> makes display come back. drm-tip tip is affected as well (last test
> 2023-08-02).
> 
> Bisecting points at a4e771729a51 ("drm/probe_helper: sort out poll_running vs 
> poll_enabled").
> Reverting that commit on top of 6.3 fixes the issue for me.
> 
> System information:
> * System architecture: x86_64
> * Kernel version: 6.3.arch1
> * Linux distribution: Arch Linux
> * Machine: QNAP TS-251A, CPU: Intel(R) Celeron(R) CPU N3060 @ 1.60GHz
> * Display connector: single HDMI display
> * dmesg with debug information (captured on drm-tip, following above 4 
> steps): [2]
> * xrandr output:
> 
> Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 16384 x 16384
> DP-1 disconnected (normal left inverted right x axis y axis)
> HDMI-1 connected primary 1920x1080+0+0 (normal left inverted right x axis 
> y axis) 708mm x 398mm
>1920x1080 60.00*+  50.0059.9430.0025.0024.00
> 29.9723.98
>1920x1080i60.0050.0059.94
>1360x768  59.80
>1280x768  60.35
>1280x720  60.0050.0059.94
>1024x768  75.0370.0760.00
>832x624   74.55
>800x600   75.0060.32
>720x576   50.00
>720x480   60.0059.94
>640x480   75.0060.0059.94
>720x400   70.08
> DP-2 disconnected (normal left inverted right x axis y axis)
> HDMI-2 disconnected (normal left inverted right x axis y axis)```
> 
> I'm willing to provide additional information and/or test fixes.
> 
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/8451
> [2] 
> https://gitlab.freedesktop.org/drm/intel/uploads/fda7aff0b13ef20962856c2c7be51544/dmesg.txt
> 
> #regzbot introduced: a4e771729a51
> 
> --
> Best regards,
> Mikhail Rudenko


Re: [Intel-gfx] alderlake crashes (random memory corruption?) with 6.0 i915 / ucode related

2022-10-17 Thread Thorsten Leemhuis
CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 17.10.22 12:48, Hans de Goede wrote:
> On 10/17/22 10:39, Jani Nikula wrote:
>> On Mon, 17 Oct 2022, Jani Nikula  wrote:
>>> On Thu, 13 Oct 2022, Hans de Goede  wrote:
 With 6.0 the following WARN triggers:
 drivers/gpu/drm/i915/display/intel_bios.c:477:

 drm_WARN(&i915->drm, min_size == 0,
  "Block %d min_size is zero\n", section_id);
>>>
>>> What's the value of section_id that gets printed?
>>
>> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
>> Use hardcoded fp_timing size for generating LFP data pointers") in
>> v6.1-rc1.
>>
>> I don't think this is the root cause for your issues, but I wonder if
>> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
>> already too?
> 
> 6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
> reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
> crashed during boot (but not always).
> 
> Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
> 
> Any other commits which I can try before I go down the bisect route ?
> 
> (I'm assuming this will also affect other users, so we really need
> to fix this for 6.0.x

+1

> before it starts hitting Arch + Fedora users)

FWIW, I heard both openSUSE Tumbleweed and Arch switched to 6.0.y in the
past few days already.

Ciao, Thorsten


Re: [Intel-gfx] Regression on 5.19.12, display flickering on Framework laptop

2022-10-03 Thread Thorsten Leemhuis



On 03.10.22 19:48, Ville Syrjälä wrote:
> On Mon, Oct 03, 2022 at 08:45:18PM +0300, Ville Syrjälä wrote:
>> On Sat, Oct 01, 2022 at 12:07:39PM +0200, Thorsten Leemhuis wrote:
>>> On 30.09.22 14:26, Jerry Ling wrote:
>>>>
>>>> looks like someone has done it:
>>>> https://bbs.archlinux.org/viewtopic.php?pid=2059823#p2059823
>>>>
>>>> and the bisect points to:
>>>>
>>>> |# first bad commit: [fc6aff984b1c63d6b9e54f5eff9cc5ac5840bc8c]
>>>> drm/i915/bios: Split VBT data into per-panel vs. global parts Best, Jerry |
>>>
>>> FWIW, that's 3cf050762534 in mainline. Adding Ville, its author to the
>>> list of recipients.
>>
>> I definitely had no plans to backport any of that stuff,
>> but I guess the automagics did it anyway.
>>
>> Looks like stable is at least missing this pile of stuff:
>> 50759c13735d drm/i915/pps: Keep VDD enabled during eDP probe
>> 67090801489d drm/i915/pps: Reinit PPS delays after VBT has been fully parsed
>> 8e75e8f573e1 drm/i915/pps: Split PPS init+sanitize in two
>> 586294c3c186 drm/i915/pps: Stash away original BIOS programmed PPS delays
>> 89fcdf430599 drm/i915/pps: Don't apply quirks/etc. to the VBT PPS delays if 
>> they haven't been initialized
>> 60b02a09598f drm/i915/pps: Introduce pps_delays_valid()
>>
>> But dunno if even that is enough.

If you need testers: David (now CCed) apparently has a affected machine
and offered to test patches in a different subthread of this thread.

>> This bug report is probably the same thing:
>> https://gitlab.freedesktop.org/drm/intel/-/issues/7013

Sounds like it.

 > Also cc intel-gfx...

Ahh, sorry, should have done that when I CCed you.

Ciao, Thorsten


>>> Did anyone check if a revert on top of 5.19.12 works easily and solves
>>> the problem?
>>>
>>> And does anybody known if mainline affected, too?
>>>
>>> Ciao, Thorsten
>>>
>>>
>>>> On 9/30/22 07:11, Slade Watkins wrote:
>>>>> Hey Greg,
>>>>>
>>>>>> On Sep 30, 2022, at 1:59 AM, Greg KH  wrote:
>>>>>>
>>>>>> On Fri, Sep 30, 2022 at 06:37:48AM +0200, Greg KH wrote:
>>>>>>> On Thu, Sep 29, 2022 at 10:26:25PM -0400, Jerry Ling wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> It has been reported by multiple users across a handful of distros
>>>>>>>> that
>>>>>>>> there seems to be regression on Framework laptop (which presumably
>>>>>>>> is not
>>>>>>>> that special in terms of mobo and display)
>>>>>>>>
>>>>>>>> Ref:
>>>>>>>> https://community.frame.work/t/psa-dont-upgrade-to-linux-kernel-5-19-12-arch1-1-on-arch-linux-gen-11-model/23171
>>>>>>> Can anyone do a 'git bisect' to find the offending commit?
>>>>>> Also, this works for me on a gen 12 framework laptop:
>>>>>> $ uname -a
>>>>>> Linux frame 5.19.12 #68 SMP PREEMPT_DYNAMIC Fri Sep 30 07:02:33
>>>>>> CEST 2022 x86_64 GNU/Linux
>>>>>>
>>>>>> so there's something odd with the older hardware?
>>>>>>
>>>>>> greg k-h
>>>>> Could be. Running git bisect for 5.19.11 and 5.19.12 (as suggested by
>>>>> the linked forum thread) returned nothing on gen 11 for me.
>>>>>
>>>>> This is very odd,
>>>>> -srw
>>>>
>>>>
>>
>> -- 
>> Ville Syrjälä
>> Intel
> 


Re: [Intel-gfx] Xorg SEGV in Xen PV dom0 after updating from 5.16.18 to 5.17.5 #forregzbot

2022-08-16 Thread Thorsten Leemhuis
TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 04.05.22 07:46, Thorsten Leemhuis wrote:
> On 04.05.22 02:37, Marek Marczykowski-Górecki wrote:
>>
>> After updating from 5.16.18 to 5.17.5 in Xen PV dom0, my Xorg started
>> crashing when displaying any window mapped from a guest (domU) system.
>> This is 100% reproducible.
>> The system is Qubes OS, and it uses a trick that maps windows content
>> from other guests using Xen grant tables, wrapped as "shared memory"
>> from Xorg point of view (so, the memory that Xorg mmaps is not just from
>> another process, but from another VM). That's the ShmPutImage you can
>> see on the stack trace below.
>> [...]
>> I don't see any related kernel or Xen messages at this time. Xorg's SEGV
>> handler prints also:
>>
>> (EE) Segmentation fault at address 0x3c010
>>
>> Git bisect says it's bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT
>> with pat_enabled()"), and indeed with this commit reverted on top of
>> 5.17.5 everything works fine.
>>
>> I guess this part of dom0's boot dmesg may be relevant:
>>
>> [0.000949] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>> [0.000953] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
>>
>> Originally reported at
>> https://github.com/QubesOS/qubes-issues/issues/7479
>>  
>> #regzbot introduced bdd8b6c98239cad
>> #regzbot monitor: https://github.com/QubesOS/qubes-issues/issues/7479

#regzbot fixed-by: 72cbc8f04fe2fa9


Re: [Intel-gfx] [PATCH 1/1] drm/i915/guc: Update to GuC version 70.1.1 #forregzbot

2022-07-15 Thread Thorsten Leemhuis
[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker.

On 15.07.22 01:08, Dave Airlie wrote:
> On Fri, 15 Apr 2022 at 10:15, Matt Roper  wrote:
>>
>> On Tue, Apr 12, 2022 at 03:59:55PM -0700, john.c.harri...@intel.com wrote:
>>> From: John Harrison 
>>>
>>> The latest GuC firmware drops the context descriptor pool in favour of
>>> passing all creation data in the create H2G. It also greatly simplifies
>>> the work queue and removes the process descriptor used for multi-LRC
>>> submission. So, remove all mention of LRC and process descriptors and
>>> update the registration code accordingly.
>>>
>>> Unfortunately, the new API also removes the ability to set default
>>> values for the scheduling policies at context registration time.
>>> Instead, a follow up H2G must be sent. The individual scheduling
>>> policy update H2G commands are also dropped in favour of a single KLV
>>> based H2G. So, change the update wrappers accordingly and call this
>>> during context registration..
>>>
>>> Of course, this second H2G per registration might fail due to being
>>> backed up. The registration code has a complicated state machine to
>>> cope with the actual registration call failing. However, if that works
>>> then there is no support for unwinding if a further call should fail.
>>> Unwinding would require sending a H2G to de-register - but that can't
>>> be done because the CTB is already backed up.
>>>
>>> So instead, add a new flag to say whether the context has a pending
>>> policy update. This is set if the policy H2G fails at registration
>>> time. The submission code checks for this flag and retries the policy
>>> update if set. If that call fails, the submission path early exists
>>> with a retry error. This is something that is already supported for
>>> other reasons.
>>>
>>> Signed-off-by: John Harrison 
>>> Reviewed-by: Daniele Ceraolo Spurio 
>>
>> Applied to drm-intel-gt-next.  Thanks for the patch and review.
>>
> 
> (cc'ing Linus and danvet, as a headsup, there is also a phoronix
> article where this was discovered).
> 
> Okay WTF.
> 
> This is in no way acceptable. This needs to be fixed in 5.19-rc ASAP.
> 
> Once hardware is released and we remove the gate flag by default, you
> cannot just bump firmware versions blindly.
> 
> The kernel needs to retain compatibility with all released firmwares
> since a device was declared supported.
> 
> This needs to be reverted, and then 70 should be introduced with a
> fallback to 69 versions.
> 
> Very disappointing, I expect this to get dealt with v.quickly.

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 2584b3549f4c4081
#regzbot title
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replies to), as explained for
in the Linux kernel's documentation; above webpage explains why this is
important for tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.


Re: [Intel-gfx] [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-25 Thread Thorsten Leemhuis
On 25.05.22 10:37, Jan Beulich wrote:
> On 25.05.2022 09:45, Thorsten Leemhuis wrote:
>> On 24.05.22 20:32, Chuck Zmudzinski wrote:
>>> On 5/21/22 6:47 AM, Thorsten Leemhuis wrote:
>>>> I'm not a developer and I'm don't known the details of this thread and
>>>> the backstory of the regression, but it sounds like that's the approach
>>>> that is needed here until someone comes up with a fix for the regression
>>>> exposed by bdd8b6c98239.
>>>>
>>>> But if I'm wrong, please tell me.
>>>
>>> You are mostly right, I think. Reverting bdd8b6c98239 fixes
>>> it. There is another way to fix it, though.
>>
>> Yeah, I'm aware of it. But it seems...
>>
>>> The patch proposed
>>> by Jan Beulich also fixes the regression on my system, so as
>>> the person reporting this is a regression, I would also be satisfied
>>> with Jan's patch instead of reverting bdd8b6c98239 as a fix. Jan
>>> posted his proposed patch here:
>>>
>>> https://lore.kernel.org/lkml/9385fa60-fa5d-f559-a137-6608408f8...@suse.com/
>>
>> ...that approach is not making any progress either?
>>
>> Jan, can could provide a short status update here? I'd really like to
>> get this regression fixed one way or another rather sooner than later,
>> as this is taken way to long already IMHO.
> 
> What kind of status update could I provide? I've not heard back from
> anyone of the maintainers, so I have no way to know what (if anything)
> I need to do.

That is perfectly fine as a status update for me (I track a lot of
regression and it's easy to miss updated patches, discussion in other
places, and things like that).

Could you maybe send a reminder to the maintainer that this is a fix for
regression that is bothering people and needs to be handled with high
priority? Feel free to tell them the Linux kernel regression tracker is
pestering you because things are taken so long. :-D

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.


Re: [Intel-gfx] [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-25 Thread Thorsten Leemhuis



On 24.05.22 20:32, Chuck Zmudzinski wrote:
> On 5/21/22 6:47 AM, Thorsten Leemhuis wrote:
>> On 20.05.22 16:48, Chuck Zmudzinski wrote:
>>> On 5/20/2022 10:06 AM, Jan Beulich wrote:
>>>> On 20.05.2022 15:33, Chuck Zmudzinski wrote:
>>>>> On 5/20/2022 5:41 AM, Jan Beulich wrote:
>>>>>> On 20.05.2022 10:30, Chuck Zmudzinski wrote:
>>>>>>> On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:
>>>>>>>> On 5/20/2022 2:05 AM, Jan Beulich wrote:
>>>>>>>>> On 20.05.2022 06:43, Chuck Zmudzinski wrote:
>>>>>>>>>> On 5/4/22 5:14 AM, Juergen Gross wrote:
>>>>>>>>>>> On 04.05.22 10:31, Jan Beulich wrote:
>>>>>>>>>>>> On 03.05.2022 15:22, Juergen Gross wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> ... these uses there are several more. You say nothing on why
>>>>>>>>>>>> those want
>>>>>>>>>>>> leaving unaltered. When preparing my earlier patch I did
>>>>>>>>>>>> inspect them
>>>>>>>>>>>> and came to the conclusion that these all would also better
>>>>>>>>>>>> observe the
>>>>>>>>>>>> adjusted behavior (or else I couldn't have left pat_enabled()
>>>>>>>>>>>> as the
>>>>>>>>>>>> only predicate). In fact, as said in the description of my
>>>>>>>>>>>> earlier
>>>>>>>>>>>> patch, in
>>>>>>>>>>>> my debugging I did find the use in i915_gem_object_pin_map()
>>>>>>>>>>>> to be
>>>>>>>>>>>> the
>>>>>>>>>>>> problematic one, which you leave alone.
>>>>>>>>>>> Oh, I missed that one, sorry.
>>>>>>>>>> That is why your patch would not fix my Haswell unless
>>>>>>>>>> it also touches i915_gem_object_pin_map() in
>>>>>>>>>> drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>>>>>>>>>
>>>>>>>>>>> I wanted to be rather defensive in my changes, but I agree at
>>>>>>>>>>> least
>>>>>>>>>>> the
>>>>>>>>>>> case in arch_phys_wc_add() might want to be changed, too.
>>>>>>>>>> I think your approach needs to be more aggressive so it will fix
>>>>>>>>>> all the known false negatives introduced by bdd8b6c98239
>>>>>>>>>> such as the one in i915_gem_object_pin_map().
>>>>>>>>>>
>>>>>>>>>> I looked at Jan's approach and I think it would fix the issue
>>>>>>>>>> with my Haswell as long as I don't use the nopat option. I
>>>>>>>>>> really don't have a strong opinion on that question, but I
>>>>>>>>>> think the nopat option as a Linux kernel option, as opposed
>>>>>>>>>> to a hypervisor option, should only affect the kernel, and
>>>>>>>>>> if the hypervisor provides the pat feature, then the kernel
>>>>>>>>>> should not override that,
>>>>>>>>> Hmm, why would the kernel not be allowed to override that? Such
>>>>>>>>> an override would affect only the single domain where the
>>>>>>>>> kernel runs; other domains could take their own decisions.
>>>>>>>>>
>>>>>>>>> Also, for the sake of completeness: "nopat" used when running on
>>>>>>>>> bare metal has the same bad effect on system boot, so there
>>>>>>>>> pretty clearly is an error cleanup issue in the i915 driver. But
>>>>>>>>> that's orthogonal, and I expect the maintainers may not even care
>>>>>>>>> (but tell us "don't do that then").
>>>>>>> Actually I just did a test with the last official Debian kernel
>>>>>>> build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
>>>>>>> applied. In fact, the nopat option does *not* break the i915 driver
>>>

Re: [Intel-gfx] [PATCH 2/2] x86/pat: add functions to query specific cache mode availability

2022-05-21 Thread Thorsten Leemhuis
On 20.05.22 16:48, Chuck Zmudzinski wrote:
> On 5/20/2022 10:06 AM, Jan Beulich wrote:
>> On 20.05.2022 15:33, Chuck Zmudzinski wrote:
>>> On 5/20/2022 5:41 AM, Jan Beulich wrote:
 On 20.05.2022 10:30, Chuck Zmudzinski wrote:
> On 5/20/2022 2:59 AM, Chuck Zmudzinski wrote:
>> On 5/20/2022 2:05 AM, Jan Beulich wrote:
>>> On 20.05.2022 06:43, Chuck Zmudzinski wrote:
 On 5/4/22 5:14 AM, Juergen Gross wrote:
> On 04.05.22 10:31, Jan Beulich wrote:
>> On 03.05.2022 15:22, Juergen Gross wrote:
>>
>> ... these uses there are several more. You say nothing on why
>> those want
>> leaving unaltered. When preparing my earlier patch I did
>> inspect them
>> and came to the conclusion that these all would also better
>> observe the
>> adjusted behavior (or else I couldn't have left pat_enabled()
>> as the
>> only predicate). In fact, as said in the description of my
>> earlier
>> patch, in
>> my debugging I did find the use in i915_gem_object_pin_map()
>> to be
>> the
>> problematic one, which you leave alone.
> Oh, I missed that one, sorry.
 That is why your patch would not fix my Haswell unless
 it also touches i915_gem_object_pin_map() in
 drivers/gpu/drm/i915/gem/i915_gem_pages.c

> I wanted to be rather defensive in my changes, but I agree at
> least
> the
> case in arch_phys_wc_add() might want to be changed, too.
 I think your approach needs to be more aggressive so it will fix
 all the known false negatives introduced by bdd8b6c98239
 such as the one in i915_gem_object_pin_map().

 I looked at Jan's approach and I think it would fix the issue
 with my Haswell as long as I don't use the nopat option. I
 really don't have a strong opinion on that question, but I
 think the nopat option as a Linux kernel option, as opposed
 to a hypervisor option, should only affect the kernel, and
 if the hypervisor provides the pat feature, then the kernel
 should not override that,
>>> Hmm, why would the kernel not be allowed to override that? Such
>>> an override would affect only the single domain where the
>>> kernel runs; other domains could take their own decisions.
>>>
>>> Also, for the sake of completeness: "nopat" used when running on
>>> bare metal has the same bad effect on system boot, so there
>>> pretty clearly is an error cleanup issue in the i915 driver. But
>>> that's orthogonal, and I expect the maintainers may not even care
>>> (but tell us "don't do that then").
> Actually I just did a test with the last official Debian kernel
> build of Linux 5.16, that is, a kernel before bdd8b6c98239 was
> applied. In fact, the nopat option does *not* break the i915 driver
> in 5.16. That is, with the nopat option, the i915 driver loads
> normally on both the bare metal and on the Xen hypervisor.
> That means your presumption (and the presumption of
> the author of bdd8b6c98239) that the "nopat" option was
> being observed by the i915 driver is incorrect. Setting "nopat"
> had no effect on my system with Linux 5.16. So after doing these
> tests, I am against the aggressive approach of breaking the i915
> driver with the "nopat" option because prior to bdd8b6c98239,
> nopat did not break the i915 driver. Why break it now?
 Because that's, in my understanding, is the purpose of "nopat"
 (not breaking the driver of course - that's a driver bug -, but
 having an effect on the driver).
>>> I wouldn't call it a driver bug, but an incorrect configuration of the
>>> kernel by the user.  I presume X86_FEATURE_PAT is required by the
>>> i915 driver
>> The driver ought to work fine without PAT (and hence without being
>> able to make WC mappings). It would use UC instead and be slow, but
>> it ought to work.
>>
>>> and therefore the driver should refuse to disable
>>> it if the user requests to disable it and instead warn the user that
>>> the driver did not disable the feature, contrary to what the user
>>> requested with the nopat option.
>>>
>>> In any case, my test did not verify that when nopat is set in Linux
>>> 5.16,
>>> the thread takes the same code path as when nopat is not set,
>>> so I am not totally sure that the reason nopat does not break the
>>> i915 driver in 5.16 is that static_cpu_has(X86_FEATURE_PAT)
>>> returns true even when nopat is set. I could test it with a custom
>>> log message in 5.16 if that is necessary.
>>>
>>> Are you saying it was wrong for static_cpu_has(X86_FEATURE_PAT)
>>> to return true in 5.16 when the user requests nopat?
>> No, I'm not saying that. It was wrong for this construct to be used
>> in the driver, which was fixed for 5.17 (and which had caused the
>> regression I di

Re: [Intel-gfx] Xorg SEGV in Xen PV dom0 after updating from 5.16.18 to 5.17.5

2022-05-15 Thread Thorsten Leemhuis
On 04.05.22 08:48, Juergen Gross wrote:
> On 04.05.22 07:46, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Sending this just to
>> CC the developers of the culprit mentioned below (bdd8b6c98239cad
>> ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")) and the
>> maintainers for the subsystem.
>>
>> While at it a quick note: I wonder if this is problem a similar to one
>> that recently turned up with amdgpu and is fixed by this problem:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=78b12008f20
> 
> No, this is different.
> 
> I have posted a patch yesterday which should fix the issue:
> 
> https://lore.kernel.org/lkml/20220503132207.17234-3-jgr...@suse.com/T/#m75efc68c96d8f7160229b5f3147242221ce0c28c

What happened to that? It looks like there wasn't any progress in the
past week to get this regression fixed, which sometimes happens, but is
kinda undesired when it comes to regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

>> Ciao, Thorsten
>>
>> On 04.05.22 02:37, Marek Marczykowski-Górecki wrote:
>>>
>>> After updating from 5.16.18 to 5.17.5 in Xen PV dom0, my Xorg started
>>> crashing when displaying any window mapped from a guest (domU) system.
>>> This is 100% reproducible.
>>> The system is Qubes OS, and it uses a trick that maps windows content
>>> from other guests using Xen grant tables, wrapped as "shared memory"
>>> from Xorg point of view (so, the memory that Xorg mmaps is not just from
>>> another process, but from another VM). That's the ShmPutImage you can
>>> see on the stack trace below.
>>>
>>> Stack trace of thread 12858:
>>> #0  0x7f80029e17d5 raise (libc.so.6 + 0x3c7d5)
>>> #1  0x7f80029ca895 abort (libc.so.6 + 0x25895)
>>> #2  0x5b3469ace0e0 OsAbort (Xorg + 0x1c60e0)
>>> #3  0x5b3469ad3959 AbortServer (Xorg + 0x1cb959)
>>> #4  0x5b3469ad46aa FatalError (Xorg + 0x1cc6aa)
>>> #5  0x5b3469acb450 OsSigHandler (Xorg + 0x1c3450)
>>> #6  0x7f8002b85a90 __restore_rt (libpthread.so.0 + 0x14a90)
>>> #7  0x7f8002b0a2a1 __memmove_avx_unaligned_erms (libc.so.6 +
>>> 0x1652a1)
>>> #8  0x7f80015dfcc9 linear_to_xtiled_faster (iris_dri.so + 0xc91cc9)
>>> #9  0x7f80015e3477 _isl_memcpy_linear_to_tiled (iris_dri.so +
>>> 0xc95477)
>>> #10 0x7f8001468440 iris_texture_subdata (iris_dri.so + 0xb1a440)
>>> #11 0x7f8000a76107 st_TexSubImage (iris_dri.so + 0x128107)
>>> #12 0x7f8000be9a47 texture_sub_image (iris_dri.so + 0x29ba47)
>>> #13 0x7f8000becd0c texsubimage_err (iris_dri.so + 0x29ed0c)
>>> #14 0x7f8000bf2939 _mesa_TexSubImage2D (iris_dri.so + 0x2a4939)
>>> #15 0x7f800213831f glamor_upload_boxes (libglamoregl.so + 0x1e31f)
>>> #16 0x7f800213856f glamor_upload_region (libglamoregl.so + 0x1e56f)
>>> #17 0x7f800212aea6 glamor_put_image (libglamoregl.so + 0x10ea6)
>>> #18 0x5b3469a4d79c damagePutImage (Xorg + 0x14579c)
>>> #19 0x5b3469a00a7e ProcShmPutImage (Xorg + 0xf8a7e)
>>> #20 0x5b3469965a2b Dispatch (Xorg + 0x5da2b)
>>> #21 0x5b3469969b04 dix_main (Xorg + 0x61b04)
>>> #22 0x7f80029cc082 __libc_start_main (libc.so.6 + 0x27082)
>>> #23 0x5b3469952e6e _start (Xorg + 0x4ae6e)
>>>
>>> Disassembly of the surrounding code:
>>>
>>>     0x7596ae8c82fb <+123>:    ja 0x7596ae8c8338
>>> <__memmove_avx_unaligned_erms+184>
>>>     0x7596ae8c82fd <+125>:    jb 0x7596ae8c8304
>>> <__memmove_avx_unaligned_erms+132>
>>>     0x7596ae8c82ff <+127>:    movzbl (%rsi),%ecx
>>>     0x7596ae8c8302 <+130>:    mov    %cl,(%rdi)
>>>     0x7596ae8c8304 <+132>:    retq
>>>     0x7596ae8c8305 <+133>:    vmovdqu (%rsi),%xmm0
>>>     0x7596ae8c8309 <+137>:    vmovdqu -0x10(%rsi,%rdx,1),%xmm1
>>> => 0x7596ae8c830f <+143>:    vmovdqu %xmm0,(%rdi)
>>>     0x7596ae8c8313 <+147>:    vmovdqu %xmm1,-0x10(%rdi,%rdx,1)
>>>     0x7596ae8c8319 <+153>:    retq
>>>
>>>
>>> I don't see 

Re: [Intel-gfx] [5.18 regression] drm/i915 BYT rendering broken due to "Remove short-term pins from execbuf, v6" #forregzbot

2022-05-13 Thread Thorsten Leemhuis
TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 09.05.22 09:01, Thorsten Leemhuis wrote:
> [TLDR: I'm adding this regression report to the list of tracked
> regressions; all text from me you find below is based on a few templates
> paragraphs you might have encountered already already in similar form.]
> 
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
> 
> Thanks for the report.
> 
> To be sure below issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
> 
> #regzbot ^introduced b5cfe6f7a6e1
> #regzbot title drm/i915: BYT rendering broken due to "Remove short-term
> pins from execbuf, v6"
> #regzbot ignore-activity

#regzbot link: https://gitlab.freedesktop.org/drm/intel/-/issues/5806
#regzbot monitor:
https://lore.kernel.org/all/2022055219.46507-1-maarten.lankho...@linux.intel.com/

> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply -- ideally with also
> telling regzbot about it, as explained here:
> https://linux-regtracking.leemhuis.info/tracked-regression/
> 
> Reminder for developers: When fixing the issue, add 'Link:' tags
> pointing to the report (the mail this one replied to), as the kernel's
> documentation call for; above page explains why this is important for
> tracked regressions.
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
> 
> 
> 
> On 08.05.22 16:38, Hans de Goede wrote:
>> Hi All,
>>
>> When running a 5.18-rc4 (and -rc5) kernel on a Chuwi Hi 8, which is
>> a Bay Trail based tablet with 2G RAM and a 1200x1920 DSI panel.
>> I noticed that gnome-shell was misrendering. Many UI elements were
>> missing (they were all black) and at the gdm login screen (which is
>> a special gnome-shell session) the screen often was entirely black
>> until I move the cursor around and then various things got
>> highlighted after which they sometimes stuck around and sometimes
>> they disappeared again after the highlight.
>>
>> Since this problem does not happen with various 5.17.y kernels I
>> believe that this is a kernel regression in 5.18. I've bisected this
>> and the bisect points to:
>>
>> commit b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.")
>>
>> from Maarten. This commit cleanly reverts on top of 5.18-rc5 and
>> I can confirm that 5.18-rc5 with b5cfe6f7a6e1 reverted fixes things.
>>
>> I would be more then happy to test any possible fixes for this.
>>
>> Regards,
>>
>> Hans
>>
>>
>>


Re: [Intel-gfx] [5.18 regression] drm/i915 BYT rendering broken due to "Remove short-term pins from execbuf, v6"

2022-05-09 Thread Thorsten Leemhuis
[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Thanks for the report.

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced b5cfe6f7a6e1
#regzbot title drm/i915: BYT rendering broken due to "Remove short-term
pins from execbuf, v6"
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replied to), as the kernel's
documentation call for; above page explains why this is important for
tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.



On 08.05.22 16:38, Hans de Goede wrote:
> Hi All,
> 
> When running a 5.18-rc4 (and -rc5) kernel on a Chuwi Hi 8, which is
> a Bay Trail based tablet with 2G RAM and a 1200x1920 DSI panel.
> I noticed that gnome-shell was misrendering. Many UI elements were
> missing (they were all black) and at the gdm login screen (which is
> a special gnome-shell session) the screen often was entirely black
> until I move the cursor around and then various things got
> highlighted after which they sometimes stuck around and sometimes
> they disappeared again after the highlight.
> 
> Since this problem does not happen with various 5.17.y kernels I
> believe that this is a kernel regression in 5.18. I've bisected this
> and the bisect points to:
> 
> commit b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.")
> 
> from Maarten. This commit cleanly reverts on top of 5.18-rc5 and
> I can confirm that 5.18-rc5 with b5cfe6f7a6e1 reverted fixes things.
> 
> I would be more then happy to test any possible fixes for this.
> 
> Regards,
> 
> Hans
> 
> 
> 


Re: [Intel-gfx] Xorg SEGV in Xen PV dom0 after updating from 5.16.18 to 5.17.5

2022-05-03 Thread Thorsten Leemhuis
On 04.05.22 08:48, Juergen Gross wrote:
> On 04.05.22 07:46, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker. Sending this just to
>> CC the developers of the culprit mentioned below (bdd8b6c98239cad
>> ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")) and the
>> maintainers for the subsystem.
>>
>> While at it a quick note: I wonder if this is problem a similar to one
>> that recently turned up with amdgpu and is fixed by this problem:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=78b12008f20
>>
> 
> No, this is different.
> 
> I have posted a patch yesterday which should fix the issue:
> 
> https://lore.kernel.org/lkml/20220503132207.17234-3-jgr...@suse.com/T/#m75efc68c96d8f7160229b5f3147242221ce0c28c

Ahh, great, thx for letting us know.

#regzbot monitor:
https://lore.kernel.org/lkml/20220503132207.17234-1-jgr...@suse.com/

Ciao, Thorsten

>> Ciao, Thorsten
>>
>> On 04.05.22 02:37, Marek Marczykowski-Górecki wrote:
>>>
>>> After updating from 5.16.18 to 5.17.5 in Xen PV dom0, my Xorg started
>>> crashing when displaying any window mapped from a guest (domU) system.
>>> This is 100% reproducible.
>>> The system is Qubes OS, and it uses a trick that maps windows content
>>> from other guests using Xen grant tables, wrapped as "shared memory"
>>> from Xorg point of view (so, the memory that Xorg mmaps is not just from
>>> another process, but from another VM). That's the ShmPutImage you can
>>> see on the stack trace below.
>>>
>>> Stack trace of thread 12858:
>>> #0  0x7f80029e17d5 raise (libc.so.6 + 0x3c7d5)
>>> #1  0x7f80029ca895 abort (libc.so.6 + 0x25895)
>>> #2  0x5b3469ace0e0 OsAbort (Xorg + 0x1c60e0)
>>> #3  0x5b3469ad3959 AbortServer (Xorg + 0x1cb959)
>>> #4  0x5b3469ad46aa FatalError (Xorg + 0x1cc6aa)
>>> #5  0x5b3469acb450 OsSigHandler (Xorg + 0x1c3450)
>>> #6  0x7f8002b85a90 __restore_rt (libpthread.so.0 + 0x14a90)
>>> #7  0x7f8002b0a2a1 __memmove_avx_unaligned_erms (libc.so.6 +
>>> 0x1652a1)
>>> #8  0x7f80015dfcc9 linear_to_xtiled_faster (iris_dri.so + 0xc91cc9)
>>> #9  0x7f80015e3477 _isl_memcpy_linear_to_tiled (iris_dri.so +
>>> 0xc95477)
>>> #10 0x7f8001468440 iris_texture_subdata (iris_dri.so + 0xb1a440)
>>> #11 0x7f8000a76107 st_TexSubImage (iris_dri.so + 0x128107)
>>> #12 0x7f8000be9a47 texture_sub_image (iris_dri.so + 0x29ba47)
>>> #13 0x7f8000becd0c texsubimage_err (iris_dri.so + 0x29ed0c)
>>> #14 0x7f8000bf2939 _mesa_TexSubImage2D (iris_dri.so + 0x2a4939)
>>> #15 0x7f800213831f glamor_upload_boxes (libglamoregl.so + 0x1e31f)
>>> #16 0x7f800213856f glamor_upload_region (libglamoregl.so + 0x1e56f)
>>> #17 0x7f800212aea6 glamor_put_image (libglamoregl.so + 0x10ea6)
>>> #18 0x5b3469a4d79c damagePutImage (Xorg + 0x14579c)
>>> #19 0x5b3469a00a7e ProcShmPutImage (Xorg + 0xf8a7e)
>>> #20 0x5b3469965a2b Dispatch (Xorg + 0x5da2b)
>>> #21 0x5b3469969b04 dix_main (Xorg + 0x61b04)
>>> #22 0x7f80029cc082 __libc_start_main (libc.so.6 + 0x27082)
>>> #23 0x5b3469952e6e _start (Xorg + 0x4ae6e)
>>>
>>> Disassembly of the surrounding code:
>>>
>>>     0x7596ae8c82fb <+123>:    ja 0x7596ae8c8338
>>> <__memmove_avx_unaligned_erms+184>
>>>     0x7596ae8c82fd <+125>:    jb 0x7596ae8c8304
>>> <__memmove_avx_unaligned_erms+132>
>>>     0x7596ae8c82ff <+127>:    movzbl (%rsi),%ecx
>>>     0x7596ae8c8302 <+130>:    mov    %cl,(%rdi)
>>>     0x7596ae8c8304 <+132>:    retq
>>>     0x7596ae8c8305 <+133>:    vmovdqu (%rsi),%xmm0
>>>     0x7596ae8c8309 <+137>:    vmovdqu -0x10(%rsi,%rdx,1),%xmm1
>>> => 0x7596ae8c830f <+143>:    vmovdqu %xmm0,(%rdi)
>>>     0x7596ae8c8313 <+147>:    vmovdqu %xmm1,-0x10(%rdi,%rdx,1)
>>>     0x7596ae8c8319 <+153>:    retq
>>>
>>>
>>> I don't see any related kernel or Xen messages at this time. Xorg's SEGV
>>> handler prints also:
>>>
>>>  (EE) Segmentation fault at address 0x3c010
>>>
>>> Git bisect says it's bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT
>>> with pat_enabled()"), and indeed with this commit reverted on top of
>>> 5.17.5 everything works fine.
>>>
>>> I guess this part of dom0's boot dmesg may be relevant:
>>>
>>> [    0.000949] x86/PAT: MTRRs disabled, skipping PAT initialization too.
>>> [    0.000953] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP 
>>> UC  UC
>>>
>>> Originally reported at
>>> https://github.com/QubesOS/qubes-issues/issues/7479
>>>
>>>   #regzbot introduced bdd8b6c98239cad
>>> #regzbot monitor: https://github.com/QubesOS/qubes-issues/issues/7479
>>>
> 


Re: [Intel-gfx] Xorg SEGV in Xen PV dom0 after updating from 5.16.18 to 5.17.5

2022-05-03 Thread Thorsten Leemhuis
Hi, this is your Linux kernel regression tracker. Sending this just to
CC the developers of the culprit mentioned below (bdd8b6c98239cad
("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")) and the
maintainers for the subsystem.

While at it a quick note: I wonder if this is problem a similar to one
that recently turned up with amdgpu and is fixed by this problem:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=78b12008f20

Ciao, Thorsten

On 04.05.22 02:37, Marek Marczykowski-Górecki wrote:
> 
> After updating from 5.16.18 to 5.17.5 in Xen PV dom0, my Xorg started
> crashing when displaying any window mapped from a guest (domU) system.
> This is 100% reproducible.
> The system is Qubes OS, and it uses a trick that maps windows content
> from other guests using Xen grant tables, wrapped as "shared memory"
> from Xorg point of view (so, the memory that Xorg mmaps is not just from
> another process, but from another VM). That's the ShmPutImage you can
> see on the stack trace below.
> 
> Stack trace of thread 12858:
> #0  0x7f80029e17d5 raise (libc.so.6 + 0x3c7d5)
> #1  0x7f80029ca895 abort (libc.so.6 + 0x25895)
> #2  0x5b3469ace0e0 OsAbort (Xorg + 0x1c60e0)
> #3  0x5b3469ad3959 AbortServer (Xorg + 0x1cb959)
> #4  0x5b3469ad46aa FatalError (Xorg + 0x1cc6aa)
> #5  0x5b3469acb450 OsSigHandler (Xorg + 0x1c3450)
> #6  0x7f8002b85a90 __restore_rt (libpthread.so.0 + 0x14a90)
> #7  0x7f8002b0a2a1 __memmove_avx_unaligned_erms (libc.so.6 + 0x1652a1)
> #8  0x7f80015dfcc9 linear_to_xtiled_faster (iris_dri.so + 0xc91cc9)
> #9  0x7f80015e3477 _isl_memcpy_linear_to_tiled (iris_dri.so + 0xc95477)
> #10 0x7f8001468440 iris_texture_subdata (iris_dri.so + 0xb1a440)
> #11 0x7f8000a76107 st_TexSubImage (iris_dri.so + 0x128107)
> #12 0x7f8000be9a47 texture_sub_image (iris_dri.so + 0x29ba47)
> #13 0x7f8000becd0c texsubimage_err (iris_dri.so + 0x29ed0c)
> #14 0x7f8000bf2939 _mesa_TexSubImage2D (iris_dri.so + 0x2a4939)
> #15 0x7f800213831f glamor_upload_boxes (libglamoregl.so + 0x1e31f)
> #16 0x7f800213856f glamor_upload_region (libglamoregl.so + 0x1e56f)
> #17 0x7f800212aea6 glamor_put_image (libglamoregl.so + 0x10ea6)
> #18 0x5b3469a4d79c damagePutImage (Xorg + 0x14579c)
> #19 0x5b3469a00a7e ProcShmPutImage (Xorg + 0xf8a7e)
> #20 0x5b3469965a2b Dispatch (Xorg + 0x5da2b)
> #21 0x5b3469969b04 dix_main (Xorg + 0x61b04)
> #22 0x7f80029cc082 __libc_start_main (libc.so.6 + 0x27082)
> #23 0x5b3469952e6e _start (Xorg + 0x4ae6e)
> 
> Disassembly of the surrounding code:
> 
>0x7596ae8c82fb <+123>: ja 0x7596ae8c8338 
> <__memmove_avx_unaligned_erms+184>
>0x7596ae8c82fd <+125>: jb 0x7596ae8c8304 
> <__memmove_avx_unaligned_erms+132>
>0x7596ae8c82ff <+127>: movzbl (%rsi),%ecx
>0x7596ae8c8302 <+130>: mov%cl,(%rdi)
>0x7596ae8c8304 <+132>: retq   
>0x7596ae8c8305 <+133>: vmovdqu (%rsi),%xmm0
>0x7596ae8c8309 <+137>: vmovdqu -0x10(%rsi,%rdx,1),%xmm1
> => 0x7596ae8c830f <+143>: vmovdqu %xmm0,(%rdi)
>0x7596ae8c8313 <+147>: vmovdqu %xmm1,-0x10(%rdi,%rdx,1)
>0x7596ae8c8319 <+153>: retq
> 
> 
> I don't see any related kernel or Xen messages at this time. Xorg's SEGV
> handler prints also:
> 
> (EE) Segmentation fault at address 0x3c010
> 
> Git bisect says it's bdd8b6c98239cad ("drm/i915: replace X86_FEATURE_PAT
> with pat_enabled()"), and indeed with this commit reverted on top of
> 5.17.5 everything works fine.
> 
> I guess this part of dom0's boot dmesg may be relevant:
> 
> [0.000949] x86/PAT: MTRRs disabled, skipping PAT initialization too.
> [0.000953] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC  
> 
> Originally reported at
> https://github.com/QubesOS/qubes-issues/issues/7479
> 
>  
> #regzbot introduced bdd8b6c98239cad
> #regzbot monitor: https://github.com/QubesOS/qubes-issues/issues/7479
> 


[Intel-gfx] Backlight Regression in i915 that isn't handled appropriately afaics

2022-03-24 Thread Thorsten Leemhuis
Hi i915 maintainers, this is your Linux kernel regression tracker!
What's up with the following regression?

https://gitlab.freedesktop.org/drm/intel/-/issues/5284

That report it more than two weeks old now, but seems nothing of
substance happened. And the thing is: the report is older, as the issue
in fact was reported on 2022-01-31 already here:

https://bugzilla.kernel.org/show_bug.cgi?id=215553

After that there was a different ticket about it later here:
https://gitlab.freedesktop.org/drm/intel/-/issues/5027

But it got confusing, so the reporter created the ticket the first link
in this message points to. I fully understand some of the reasons why
this was not handled appropriately, but it looks like even the latest
ticket is mostly ignored, apart from some bug triaging.

So could anybody please take a look into this at at least tell the
reporter what to do to (bisection maybe?) get this solved?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

P.S.S.: for rezgbot:

Link:
https://lore.kernel.org/regressions/74ee2216-a295-c2b6-328b-3e6d0cc18...@leemhuis.info/




Re: [Intel-gfx] [PATCH 02/10] drm: Add privacy-screen class (v4)

2021-12-10 Thread Thorsten Leemhuis
On 10.12.21 11:46, Hans de Goede wrote:
> On 12/10/21 11:12, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> Top-posting for once, to make this easy accessible to everyone.
>>
>> Hans, I stumbled upon your blog post about something that afaics is
>> related to below patch:
>> https://hansdegoede.livejournal.com/25948.html
>>
>> To quote:
>>
>>> To avoid any regressions distors should modify their initrd
>>> generation tools to include privacy screen provider drivers in the
>>> initrd (at least on systems with a privacy screen), before 5.17
>>> kernels start showing up in their repos.
>>>
>>> If this change is not made, then users using a graphical bootsplash
>>> (plymouth) will get an extra boot-delay of up to 8 seconds
>>> (DeviceTimeout in plymouthd.defaults) before plymouth will show and
>>> when using disk-encryption where the LUKS password is requested from
>>> the initrd, the system will fallback to text-mode after these 8
>>> seconds.
>>
>> Sorry for intruding, but to me as the kernel's regression tracker that
>> blog post sounds a whole lot like "by kernel development standards this
>> is not allowed due to the 'no regression rule', as users should never be
>> required to update something in userspace when updating the kernel".
> 
> I completely understand where you are coming from here, but AFAIK
> the initrd generation has always been a bit of an exception here.

Many thx for the clarification. Yeah, kinda, but it afaics also partly
depends on what kind of breakage users have to endure -- which according
to the description is not that bad in this case, so I guess in this case
everything is fine as it is.

Again, thx for your answer.

Ciao, Thorsten

> For example (IIRC) over time we have seen the radeon module growing
> a runtime dependency on the amdkfd module which also required updated
> initrd generation tools.
> 
> Another example (which I'm sure of) is the i915 driver gaining a softdep
> on kvmgt (which is now gone again) which required new enough kmod tools
> to understand this as well as initrd generation tools updates to also
> take softdeps into account:
> https://github.com/dracutdevs/dracut/commit/4cdee66c8ed5f82bbd0638e30d867318343b0e6c
> 
> More in general if you look at e.g. dracut's git history, there are
> various other cases where dracut needed to be updated to adjust
> to kernel changes. For example dracut decides if a module is a block
> driver and thus may be necessary to have in the initrd based on a
> list of kernel-symbols the module links to and sometimes those
> symbols change due to refactoring of kernel internals, see e.g. :
> https://github.com/dracutdevs/dracut/commit/b292ce7295f18192124e64e5ec31161d09492160
> 
> TL;DR: initrd-generators and the kernel are simply tied together so much
> that users cannot expect to be able to jump to the latest kernel without
> either updating the initrd-generator, or adding some modules as modules
> which must always be added to the initrd in the initrd-generator config
> file (as a workaround).
> 
> Declaring kernel changes which break initrd-generation in some way as
> being regressions, would mean that e.g. we cannot introduce any
> kernel changes which causes some drm/block/whatever drivers to use
> some new register helper functions which are not yet on the list of
> symbols which dracut uses to identify drm/block/whatever drivers.
> 
> The only difference between previous initrd-generator breaking changes
> and this one, is that I decided that it would be good for everyone
> to be aware of this before hand; and now I get the feeling that I'm
> being punished for warning people about this instead of just letting
> things break silently. I know you don't intend your email this way in
> any way, but still.
> 
> Also AFAIK drivers may also at some point drop support for (much) older
> firmware versions requiring installing a recent linux-firmware together
> with a new kernel.
> 
> In my reading of the rules the 'users should never be required to update
> something in userspace when updating the kernel' rule is about keeping
> people's normal programs working, IOW not breaking userspace ABI and that
> is not happening here.
> 
>> But I'm not totally sure that's the case here. Could you please clarify
>> what happens when a user doesn't update the initramfs. E.g. what happens
>> besides the mentioned delay and the text mode (which are bad already,
>> but might be a accetable compormise here -- but that's up for Linus to
>> decide)? Does everythin

Re: [Intel-gfx] [PATCH 02/10] drm: Add privacy-screen class (v4)

2021-12-10 Thread Thorsten Leemhuis
Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

Hans, I stumbled upon your blog post about something that afaics is
related to below patch:
https://hansdegoede.livejournal.com/25948.html

To quote:

> To avoid any regressions distors should modify their initrd
> generation tools to include privacy screen provider drivers in the
> initrd (at least on systems with a privacy screen), before 5.17
> kernels start showing up in their repos.
> 
> If this change is not made, then users using a graphical bootsplash
> (plymouth) will get an extra boot-delay of up to 8 seconds
> (DeviceTimeout in plymouthd.defaults) before plymouth will show and
> when using disk-encryption where the LUKS password is requested from
> the initrd, the system will fallback to text-mode after these 8
> seconds.

Sorry for intruding, but to me as the kernel's regression tracker that
blog post sounds a whole lot like "by kernel development standards this
is not allowed due to the 'no regression rule', as users should never be
required to update something in userspace when updating the kernel".

But I'm not totally sure that's the case here. Could you please clarify
what happens when a user doesn't update the initramfs. E.g. what happens
besides the mentioned delay and the text mode (which are bad already,
but might be a accetable compormise here -- but that's up for Linus to
decide)? Does everything start to work normally shortly after the kernel
mounted the rootfs and finally can load the missing module?

tia!

Ciao, Thorsten

On 05.10.21 22:23, Hans de Goede wrote:
> On some new laptops the LCD panel has a builtin electronic privacy-screen.
> We want to export this functionality as a property on the drm connector
> object. But often this functionality is not exposed on the GPU but on some
> other (ACPI) device.
> 
> This commit adds a privacy-screen class allowing the driver for these
> other devices to register themselves as a privacy-screen provider; and
> allowing the drm/kms code to get a privacy-screen provider associated
> with a specific GPU/connector combo.
> 
> Changes in v2:
> - Make CONFIG_DRM_PRIVACY_SCREEN a bool which controls if the drm_privacy
>   code gets built as part of the main drm module rather then making it
>   a tristate which builds its own module.
> - Add a #if IS_ENABLED(CONFIG_DRM_PRIVACY_SCREEN) check to
>   drm_privacy_screen_consumer.h and define stubs when the check fails.
>   Together these 2 changes fix several dependency issues.
> - Remove module related code now that this is part of the main drm.ko
> - Use drm_class as class for the privacy-screen devices instead of
>   adding a separate class for this
> 
> Changes in v3:
> - Make the static inline drm_privacy_screen_get_state() stub set sw_state
>   and hw_state to PRIVACY_SCREEN_DISABLED to squelch an uninitialized
>   variable warning when CONFIG_DRM_PRIVICAY_SCREEN is not set
> 
> Changes in v4:
> - Make drm_privacy_screen_set_sw_state() skip calling out to the hw if
>   hw_state == new_sw_state
> 
> Reviewed-by: Emil Velikov 
> Reviewed-by: Lyude Paul 
> Signed-off-by: Hans de Goede 
> ---
>  Documentation/gpu/drm-kms-helpers.rst |  15 +
>  MAINTAINERS   |   8 +
>  drivers/gpu/drm/Kconfig   |   4 +
>  drivers/gpu/drm/Makefile  |   1 +
>  drivers/gpu/drm/drm_drv.c |   4 +
>  drivers/gpu/drm/drm_privacy_screen.c  | 403 ++
>  include/drm/drm_privacy_screen_consumer.h |  50 +++
>  include/drm/drm_privacy_screen_driver.h   |  80 +
>  include/drm/drm_privacy_screen_machine.h  |  41 +++
>  9 files changed, 606 insertions(+)
>  create mode 100644 drivers/gpu/drm/drm_privacy_screen.c
>  create mode 100644 include/drm/drm_privacy_screen_consumer.h
>  create mode 100644 include/drm/drm_privacy_screen_driver.h
>  create mode 100644 include/drm/drm_privacy_screen_machine.h
> 
> diff --git a/Documentation/gpu/drm-kms-helpers.rst 
> b/Documentation/gpu/drm-kms-helpers.rst
> index ec2f65b31930..5bb55ec1b9b5 100644
> --- a/Documentation/gpu/drm-kms-helpers.rst
> +++ b/Documentation/gpu/drm-kms-helpers.rst
> @@ -435,3 +435,18 @@ Legacy CRTC/Modeset Helper Functions Reference
>  
>  .. kernel-doc:: drivers/gpu/drm/drm_crtc_helper.c
> :export:
> +
> +Privacy-screen class
> +
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_privacy_screen.c
> +   :doc: overview
> +
> +.. kernel-doc:: include/drm/drm_privacy_screen_driver.h
> +   :internal:
> +
> +.. kernel-doc:: include/drm/drm_privacy_screen_machine.h
> +   :internal:
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_privacy_screen.c
> +   :export:
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 28e5f0ae1009..cb94bb3b8724 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -6423,6 +6423,14 @@ F: drivers/gpu/drm/drm_panel.c
>  F:   drivers/gpu/drm/panel/
>  F:   include/drm/drm_panel.h
>  
> +DRM PRIVACY-SCREEN CLASS
> 

Re: [Intel-gfx] [PATCH 00/53] Get rid of UTF-8 chars that can be mapped as ASCII

2021-05-10 Thread Thorsten Leemhuis

On 10.05.21 12:26, Mauro Carvalho Chehab wrote:
>
> As Linux developers are all around the globe, and not everybody has UTF-8
> as their default charset, better to use UTF-8 only on cases where it is really
> needed.
> […]
> The remaining patches on series address such cases on *.rst files and 
> inside the Documentation/ABI, using this perl map table in order to do the
> charset conversion:
> 
> my %char_map = (
> […]
>   0x2013 => '-',  # EN DASH
>   0x2014 => '-',  # EM DASH

I might be performing bike shedding here, but wouldn't it be better to
replace those two with "--", as explained in
https://en.wikipedia.org/wiki/Dash#Approximating_the_em_dash_with_two_or_three_hyphens

For EM DASH there seems to be even "---", but I'd say that is a bit too
much.

Or do you fear the extra work as some lines then might break the
80-character limit then?

Ciao, Thorsten
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [BUG][REGRESSION] i915 gpu hangs under load

2017-04-02 Thread Thorsten Leemhuis
Lo! On 22.03.2017 11:36, Jani Nikula wrote:
> On Wed, 22 Mar 2017, Martin Kepplinger  wrote:
>> I know something similar is here: 
>> https://bugs.freedesktop.org/show_bug.cgi?id=100110 too.
>> But this is rc3 and my machine is totally *not usable*. Let me be 
>> annoying :) I hope I can help:
> Please file a bug over at [1].
> […]
> [1] https://bugs.freedesktop.org/enter_bug.cgi?product=DRI&component=DRM/Intel

@Martin: did you file that bug? I could not find one :-/

@Jani: In similar situations could you do me a favour and ask people to
send one more reply to the public list which contains the link to the
bug filed? Regression tracking is quite hard already; searching various
bug tracker for follow up bug entries makes it even harder :-(

Ciao, Thorsten
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [regression] Re: 4.11-rc0, thinkpad x220: GPU hang

2017-03-14 Thread Thorsten Leemhuis
On 06.03.2017 00:01, Pavel Machek wrote:
>>> mplayer stopped working after a while. Dmesg says:
>>>
>>> [ 3000.266533] cdc_ether 2-1.2:1.0 usb0: register 'cdc_ether' at
> Now I'm pretty sure it is a regression in v4.11-rc0. Any ideas what to
> try? Bisect will be slow and nasty :-(.

@Pavel, @Chris: What's the status of this?

I added this report to the list of regressions for Linux 4.11. I'll try
to watch this thread for further updates on this issue to document
progress in my weekly reports. Please let me know in case the discussion
moves to a different place (bugzilla or another mail thread for
example). tia!

Ciao, Thorsten
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Skylake graphics regression: projector failure with 4.8-rc3

2016-09-18 Thread Thorsten Leemhuis
Hi! James & Paulo: What's the current status of this? Was this issue
discussed elsewhere or even fixed in between? Just asking, because this
issue is on the list of regressions for 4.8. Ciao, Thorsten

On 01.09.2016 00:25, James Bottomley wrote:
> On Wed, 2016-08-31 at 21:51 +, Zanoni, Paulo R wrote:
>> Em Qua, 2016-08-31 às 14:43 -0700, James Bottomley escreveu:
>>> On Wed, 2016-08-31 at 11:23 -0700, James Bottomley wrote:
 On Fri, 2016-08-26 at 09:10 -0400, James Bottomley wrote:
> We seem to have an xrandr regression with skylake now.  What's
> happening is that I can get output on to a projector, but the 
> system is losing video when I change the xrandr sessions (like 
> going from a --above b to a --same-as b).  The main screen goes
> blank, which is basically a reboot situation.  Unfortunately, I
> can't seem to get the logs out of systemd to see if there was a
> dump to dmesg (the system was definitely responding).
>
> I fell back to 4.6.2 which worked perfectly, so this is
> definitely 
> some sort of regression.  I'll be able to debug more fully when
> I 
> get back home from the Linux Security Summit.

 I'm home now.  Unfortunately, my monitor isn't as problematic as
 the
 projector, but by flipping between various modes and separating
 and
 overlaying the panels with --above and --same-as (xrandr), I can
 eventually get it to the point where the main LCD panel goes
 black 
 and can only be restarted by specifying a different mode.

 This seems to be associated with these lines in the X

 [ 14714.389] (EE) intel(0): failed to set mode: Invalid argument
 [22]

 But the curious thing is that even if this fails with the error 
 message once, it may succeed a second time, so it looks to be a 
 transient error translation problem from the kernel driver.

 I've attached the full log below.

 This is only with a VGA output.  I currently don't have a HDMI 
 dongle, but I'm in the process of acquiring one.
>>>
>>> After more playing around, I'm getting thousands of these in the
>>> kernel
>>> log (possibly millions: the log wraps very fast):
>>>
>>> [23504.873606] [drm:intel_dp_start_link_train [i915]] *ERROR*
>>> failed
>>> to train DP, aborting
>>>
>>> And then finally it gives up with 
>>>
>>> [25023.770951] [drm:intel_cpu_fifo_underrun_irq_handler [i915]]
>>> *ERROR* CPU pipe B FIFO underrun
>>> [25561.926075] [drm:intel_cpu_fifo_underrun_irq_handler [i915]]
>>> *ERROR* CPU pipe A FIFO underrun
>>>
>>> And the crtc for the VGA output becomes non-responsive to any
>>> configuration command.  This requires a reboot and sometimes a UEFI
>>> variable reset before it comes back.
>>
>> Please see this discussion:
>> https://patchwork.freedesktop.org/patch/103237/
>>
>> Do you have this patch on your tree? Does the problem go away if you
>> revert it?
> 
> Yes, I've got it, it went in in 4.8-rc3 according to git:
> 
> commit 58e311b09c319183254d9220c50a533e7157c9ab
> Author: Matt Roper 
> Date:   Thu Aug 4 14:08:00 2016 -0700
> 
> drm/i915/gen9: Give one extra block per line for SKL plane WM
> calculations
> 
> Reverting it causes the secondary display not to sync pretty much at
> all.  However, in the flickers I can see, it does work OK and doesn't
> now crash switching from --same-as to --above and back
> 
> I also still get the logs filling up with the link training errors.
> 
> On balance, although the behaviour is different, it's not an
> improvement because if I can't sync with the projector, I can't really
> use this as a fix.
> 
> James
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] Bad flicker on skylake HQD due to code in the 4.7 merge window

2016-07-01 Thread Thorsten Leemhuis
On 23.06.2016 13:25, James Bottomley wrote:
> On Tue, 2016-06-21 at 17:00 -0400, James Bottomley wrote:
>> On Tue, 2016-06-21 at 18:44 +0300, Ville Syrjälä wrote:
>>> On Tue, Jun 21, 2016 at 09:53:15AM -0400, James Bottomley wrote:
 On Mon, 2016-06-20 at 11:03 +0300, Jani Nikula wrote:
> Cc: Ville
>
> On Mon, 20 Jun 2016, James Bottomley <
> james.bottom...@hansenpartnership.com> wrote:
>> OK, my candidate bad commit is this one:
>>
>> commit a05628195a0d9f3173dd9aa76f482aef692e46ee
>> Author: Ville Syrjälä 
>> Date:   Mon Apr 11 10:23:51 2016 +0300
>>
>> drm/i915: Get panel_type from OpRegion panel details
>>
>> After being more careful about waiting to identify flicker, 
>> this one seems to be the one the bisect finds.  I'm now 
>> running v4.7-rc3 with this one reverted and am currently 
>> seeing no flicker problems.   It is, however, early days 
>> because the flicker can hide for long periods, so I 'll wait 
>> until Monday evening and a few reboots before declaring
>> victory.
>
> If that turns out to be the bad commit, it doesn't really 
> surprise me, and that in itself is depressing.

 As far as I can tell, after running for a day with this reverted,
 this is the problem.  The flicker hasn't appeared with it 
 reverted.  It's pretty noticeable with this commit included.
>>>
>>> Hmm. The only difference I can see is low vs. normal vswing. Panel 
>>> 0 has low, panel 2 has normal. So either the VBT or opregion is
>>> telling utter lies, or there's some other bug in our low vswing
>>> support.
>>>
>>> To confirm it's really a vswing issue, you should be able to run 
>>> with i915.edp_vswing=2 without flickers on the broken kernel.
>>
>> Preliminary boot indicates no flicker with the bad commit included 
>> and this option, but I'll have to run for quite a bit longer to 
>> verify, since it can sometimes be elusive.
> 
> Two days of runtime seems to confirm this is the problem (still no
> flicker issues).

This issue is listed in my regression reports for 4.7 and I wonder what
the status is. It seems nothing happened for more then a week now, which
is a bad sign as 4.7 final seems only a week or two away.

CU, Thorsten
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v3 1/2] iommu: Disable preemption around use of this_cpu_ptr()

2016-06-27 Thread Thorsten Leemhuis
On 15.06.2016 14:25, Joerg Roedel wrote:
> On Wed, Jun 01, 2016 at 12:10:08PM +0100, Chris Wilson wrote:
>> Between acquiring the this_cpu_ptr() and using it, ideally we don't want
>> to be preempted and work on another CPU's private data. this_cpu_ptr()
>> checks whether or not preemption is disable, and get_cpu_ptr() provides
>> a convenient wrapper for operating on the cpu ptr inside a preemption
>> disabled critical section (which currently is provided by the
>> spinlock). Indeed if we disable preemption around this_cpu_ptr,
>> we do not need the CPU local spinlock - so long as take care that no other
>> CPU is running that code as do perform the cross-CPU cache flushing and
>> teardown, but that is a subject for another patch.
> […]
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96293
> […]
>> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
>> index ba764a0835d3..e23001bfcfee 100644
>> --- a/drivers/iommu/iova.c
>> +++ b/drivers/iommu/iova.c
>> @@ -420,8 +420,10 @@ retry:
>>  
>>  /* Try replenishing IOVAs by flushing rcache. */
>>  flushed_rcache = true;
>> +preempt_disable();
>>  for_each_online_cpu(cpu)
>>  free_cpu_cached_iovas(cpu, iovad);
>> +preempt_enable();
> 
> Why do you need to disable preemption here? The free_cpu_cached_iovas
> function does not need to stay on the same cpu as it iterates over the
> rcaches for all cpus anyway.

Joerg, what's the status here? This made it on my 4.7 regressions
report, as the patches from this thread are supposed to fix a
regression; see
http://thread.gmane.org/gmane.linux.usb.general/143504/focus=153154
for details.

Please let me know if if fixes went to mainline already; I did a quick
check and could see any.

Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRmJo)
 Thorsten
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] did the drm patch to support Iris(TM) Graphics P555 fell through the cracks? (Was: Re: RFC: libdrm: Support Iris Graphics 540 & 550 (Skylake GT3e))

2016-05-31 Thread Thorsten Leemhuis
CCing danvet

Thorsten Leemhuis wrote on 28.04.2016 10:37:
> Kenneth Graunke wrote on 28.04.2016 03:05:
>> On Wednesday, April 27, 2016 9:37:07 AM PDT Thorsten Leemhuis wrote:
>>> Thorsten Leemhuis wrote on 26.04.2016 13:41:
>>> Forget that patch -- a way better one was submitted weeks ago my Michal
>>> already:
>>> https://lists.freedesktop.org/archives/intel-gfx/2016-February/087819.html
>> It looks like it fell through the cracks.  Roland just mentioned this on
>> IRC...I've reviewed and pushed the patch to master.  I'm also making a
>> release.
> Many thx. Side note, while at it: I think this linux-drm patch from
> Michał fell through the cracks, too:
> https://lists.freedesktop.org/archives/intel-gfx/2016-February/087855.html

Quote from that linux-drm patch
"""
Used by production device:
Intel(R) Iris(TM) Graphics P555
"""
A Xeon processor with said gpu is now available afaics:
http://ark.intel.com/products/93847/Intel-Xeon-Processor-E3-1558L-v5-8M-Cache-1_90-GHz

> Whole story:  That libdrm patch you applied contained this line:
> +#define PCI_CHIP_SKYLAKE_SRV_GT3 0x192D
> This id for the Iris Graphics P555 is also present in Mesa master
> i965(¹), but missing in Linux master afaics:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/drm/i915_pciids.h#n279
> And it's not in drm-intel-next either afaics:
> https://cgit.freedesktop.org/drm-intel/tree/include/drm/i915_pciids.h?h=drm-intel-next#n279

That patch (see below) afaics still wasn't applied; and it's not in
drm-intel-next-queued either. I'm wondering if there is a reason why it
wasn't merged or if it is another oversight :-/

CU, knurd

P.S.: FWIW, here is the content from
https://lists.freedesktop.org/archives/intel-gfx/2016-February/087855.html :
"""
Used by production device:
Intel(R) Iris(TM) Graphics P555

Signed-off-by: Michał Winiarski 
---
 include/drm/i915_pciids.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/drm/i915_pciids.h b/include/drm/i915_pciids.h
index 9094599..9266c2c 100644
--- a/include/drm/i915_pciids.h
+++ b/include/drm/i915_pciids.h
@@ -281,6 +281,7 @@
INTEL_VGA_DEVICE(0x1926, info), /* ULT GT3 */ \
INTEL_VGA_DEVICE(0x1927, info), /* ULT GT3 */ \
INTEL_VGA_DEVICE(0x192B, info), /* Halo GT3 */ \
+   INTEL_VGA_DEVICE(0x192D, info), /* SRV GT3 */ \
INTEL_VGA_DEVICE(0x192A, info)  /* SRV GT3 */

 #define INTEL_SKL_GT4_IDS(info) \
-- 
2.7.1
"""
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] RFC: libdrm: Support Iris Graphics 540 & 550 (Skylake GT3e)

2016-04-28 Thread Thorsten Leemhuis
Kenneth Graunke wrote on 28.04.2016 03:05:
> On Wednesday, April 27, 2016 9:37:07 AM PDT Thorsten Leemhuis wrote:
>> Thorsten Leemhuis wrote on 26.04.2016 13:41:
>> > Lo! Below patch adds the PCI-ID for the Intel(R) Iris Graphics 550 
> (Skylake
>>> GT3e mobile) to libdrm. It afaics is the last piece that is missing to
>>> make those GPUs work properly, as Linux 4.6-rc(¹) and Mesa 11.2 already
>>> support it – 
> […]
>> Forget that patch -- a way better one was submitted weeks ago my Michal
>> already:
>> https://lists.freedesktop.org/archives/intel-gfx/2016-February/087819.html
> It looks like it fell through the cracks.  Roland just mentioned this on
> IRC...I've reviewed and pushed the patch to master.  I'm also making a
> release.

Many thx. Side note, while at it: I think this linux-drm patch from
Michał fell through the cracks, too:
https://lists.freedesktop.org/archives/intel-gfx/2016-February/087855.html

Whole story:  That libdrm patch you applied contained this line:

+#define PCI_CHIP_SKYLAKE_SRV_GT3 0x192D

This id for the Iris Graphics P555 is also present in Mesa master
i965(¹), but missing in Linux master afaics:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/drm/i915_pciids.h#n279
And it's not in drm-intel-next either afaics:
https://cgit.freedesktop.org/drm-intel/tree/include/drm/i915_pciids.h?h=drm-intel-next#n279

CU, knurd

(¹)
https://cgit.freedesktop.org/mesa/mesa/tree/include/pci_ids/i965_pci_ids.h#n132

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] RFC: libdrm: Support Iris Graphics 540 & 550 (Skylake GT3e)

2016-04-27 Thread Thorsten Leemhuis
Thorsten Leemhuis wrote on 26.04.2016 13:41:
> Lo! Below patch adds the PCI-ID for the Intel(R) Iris Graphics 550 (Skylake
> GT3e mobile) to libdrm. It afaics is the last piece that is missing to
> make those GPUs work properly, as Linux 4.6-rc(¹) and Mesa 11.2 already
> support it – but without this patch I get a "error initializing buffer
> manager" message from i965 when it tries to load. I tested it on a
> laptop with a Core i5-6267U and it seems to work -- but I only did a
> few quick tests so far.

Forget that patch -- a way better one was submitted weeks ago my Michal
already:
https://lists.freedesktop.org/archives/intel-gfx/2016-February/087819.html

Did that patch simply fall through the cracks or is there a reason why
it wasn't applied to libdrm master? It is pretty obvious it would fix
the problem I saw and tried to address with that rough patch I send
yesterday.

CU, knurd

P.S.: Added intel-gfx and michal.winiarski to CC
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx