[regression] opening and closing /dev/dri/card0 in a QEMU KVM instance will shutdown system

2024-07-24 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker.

I noticed a report about a regression in bugzilla.kernel.org that
appears to be related to the simpledrm driver. As many (most?) kernel
developers don't keep an eye on the bug tracker, I decided to write this
mail. To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219007 :

>  Colin Ian King 2024-07-05 16:05:27 UTC
> 
> The following code when run as root on a Debian sid amd64 server
> running in virt-manager (KVM QEMU) will shut the system down with
> 6.10.0-rc6.  The fork() is required to cause racing on the open/close on
> /dev/dri/card0
> 
> #include 
> #include 
> 
> int main(void)
> {
>pid_t pid = fork();
> 
>while (1) {
>   int fd;
> 
>   fd = openat(AT_FDCWD, "/dev/dri/card0", 
> O_WRONLY|O_NONBLOCK|O_SYNC);
>   close(fd);
>}
> }
> 
> This was originally found using: while true; do sudo ./stress-ng
> --dev 4 --dev-file /dev/dri/card0 -t 5; done and narrowed down to the
> above reproducer. (cf:
> https://github.com/ColinIanKing/stress-ng/issues/407 )
> 
> This does not occur on pre 6.10 kernels, so it looks like a 6.10 regression.

See the ticket for more details, which also contains a dmesg from a boot
in the VM: https://bugzilla.kernel.org/attachment.cgi?id=306610

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: v6.9..v6.10
#regzbot title: drm: opening and closing /dev/dri/card0 in a QEMU KVM
instance will shutdown system
#regzbot from: Colin Ian King 
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219007
#regzbot ignore-activity


Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-09 Thread Linux regression tracking (Thorsten Leemhuis)
On 30.06.24 01:18, Mikhail Gavrilov wrote:
> On Sat, Jun 29, 2024 at 9:46 PM Rodrigo Siqueira Jordao
>  wrote:
>>
>> I'm trying to reproduce this issue, but until now, I've been unable to
>> reproduce it. I tried some different scenarios with the following
>> components:
>>
>> 1. Displays: I tried with one and two displays
>>   - 4k@120 - DP && 4k@60 - HDMI
>>   - 4k@244 Oled - DP
>> 2. GPU: 7900XTX
> 
> The issue only reproduced with RDNA2 (6900XT)
> RDNA3 (7900XTX) is not affected.

Hmmm, again this looks stalled -- and the regression report is 6 weeks
old by now. :-/ Or was a solution found in between?

So I assume no solution will be ready in time for the 6.10 final? I also
assume a "simple" temporary revert is not a option or bears big risks?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


Re: [REGRESSION] QXL display malfunction

2024-07-01 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, was there some progress wrt to fixing below regression? I might
have missed something, but from here it looks like this fall through the
cracks.

Makes me wonder if we should temporarily revert this for now to fix this
for rc7 and ensure things get at least one week of testing before the final.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.06.24 15:45, Kaplan, David wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -Original Message-
>> From: Thomas Zimmermann 
>> Sent: Wednesday, June 12, 2024 9:26 AM
>> To: Linux regressions mailing list 
>> Cc: Petkov, Borislav ;
>> zack.ru...@broadcom.com; dmitry.osipe...@collabora.com; Kaplan, David
>> ; Koenig, Christian ;
>> Dave Airlie ; Maarten Lankhorst
>> ; Maxime Ripard
>> ; LKML ; ML dri-devel
>> ; spice-de...@lists.freedesktop.org;
>> virtualizat...@lists.linux.dev
>> Subject: Re: [REGRESSION] QXL display malfunction
>>
>> Caution: This message originated from an External Source. Use proper
>> caution when opening attachments, clicking links, or responding.
>>
>>
>> Hi
>>
>> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing a few more people and lists that get_maintainers pointed out
>>> for qxl]
>>>
>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>> for once, to make this easily accessible to everyone.
>>>
>>> Thomas, from here it looks like this report that apparently is caused
>>> by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
>>> ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
>>> cracks. Or was progress made to resolve this and I just missed this?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>> hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>>
>>> On 03.06.24 04:29, Kaplan, David wrote:
>>>>> -Original Message-
>>>>> From: Kaplan, David
>>>>> Sent: Sunday, June 2, 2024 9:25 PM
>>>>> To: tzimmerm...@suse.de; dmitry.osipe...@collabora.com; Koenig,
>>>>> Christian ; zach.ru...@broadcom.com
>>>>> Cc: Petkov, Borislav ;
>>>>> regressi...@list.linux.dev
>>>>> Subject: [REGRESSION] QXL display malfunction
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
>>>>> and I've observed the VM graphics often malfunction after boot,
>>>>> sometimes failing to load the Ubuntu desktop or even immediately
>> shutting the guest down.
>>>>> When it does load, the guest dmesg log often contains errors like
>>>>>
>>>>> [4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65335296x16777216+0+0
>>
>> I don't see how these messages are related. Did they already appear before
>> the broken commit was there?
> 
> No, I did not observe them prior to the broken commit.
> 
>>
>>>>> [5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
>> id in
>>>>> release_idr
>>
>> Is there only one such message in the log? Or multiple/frequent ones.
> 
> I would usually only see one.
> 
>>
>> Could you provide a stack trace of what happens before?
> 
> Here's the top of a backtrace when the error occurs:
> #0  qxl_release_from_id_locked (qdev=qdev@entry=0x88810126e000, 
> id=id@entry=262151)
> at drivers/gpu/drm/qxl/qxl_release.c:373
> #1  0x819f5b6a in qxl_garbage_collect (qdev=0x88810126e000)
> at drivers/gpu/drm/qxl/qxl_cmd.c:222
> #2  0x

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 09.06.24 23:19, Mikhail Gavrilov wrote:
> On Fri, Jun 7, 2024 at 6:39 PM Alex Deucher  wrote:
>>
>> --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c
>> +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c
>> @@ -944,7 +944,7 @@ void optc1_set_drr(
>> OTG_V_TOTAL_MAX_SEL, 1,
>> OTG_FORCE_LOCK_ON_EVENT, 0,
>> OTG_SET_V_TOTAL_MIN_MASK_EN, 0,
>> -   OTG_SET_V_TOTAL_MIN_MASK, 0);
>> +   OTG_SET_V_TOTAL_MIN_MASK, (1 << 1)); /* 
>> TRIGA */
>>
>> // Setup manual flow control for EOF via TRIG_A
>> optc->funcs->setup_manual_trigger(optc);
> 
> Thanks, Alex.
> I applied this patch on top of 771ed66105de and unfortunately the
> issue is not fixed.
> I saw a green flashing bar on top of the screen again.

Hmmm, I might have missed something, but it looks like nothing happened
here since then. What's the status? Is the issue still happening? Any
solution in sight?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke



Re: [PATCH V2] drm/bridge: adv7511: Fix Intermittent EDID failures

2024-06-17 Thread Linux regression tracking (Thorsten Leemhuis)
On 17.06.24 15:14, Adam Ford wrote:
> On Mon, Jun 17, 2024 at 8:00 AM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> [CCing the regression list, as it should be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>
>> Hi! Top-posting for once, to make this easily accessible to everyone.
>>
>> Hmm, seem nobody took a look at below fix for a regression that seems to
>> be caused by f3d9683346d6b1 ("drm/bridge: adv7511: Allow IRQ to share
>> GPIO pins") [which went into v6.10-rc1].
>>
>> Adam and Dimitry, what are your stances on this patch from Adam? I'm
>> asking, as you authored respectively committed the culprit?
> 
> I learned of the regression from Liu Ying [...]

Ohh, I'm very sorry, stupid me somehow missed that the Adam that was
posting the fix was the same Adam that authored the culprit. :-( Seems I
definitely need more coffee (or green tea in my case) or reduce the
number or regressions on the stack. Please accept my apologies.

Thx for the update anyway.

> Dimitry had given me some suggestions, and from that,  I posted a V1.
> Dmitry had some more followup suggestions [2] which resulted in the
> V2.
>> As far as I know, Liu was satisfied that this addressed the regression
> he reported.

So in that case the main question afaics is why this fix did not make
any progress for more than two weeks now (at least afaics -- or did I
miss something in that area, too?).

Ciao, Thorsten

>> On 01.06.24 15:24, Adam Ford wrote:
>>> In the process of adding support for shared IRQ pins, a scenario
>>> was accidentally created where adv7511_irq_process returned
>>> prematurely causing the EDID to fail randomly.
>>>
>>> Since the interrupt handler is broken up into two main helper functions,
>>> update both of them to treat the helper functions as IRQ handlers. These
>>> IRQ routines process their respective tasks as before, but if they
>>> determine that actual work was done, mark the respective IRQ status
>>> accordingly, and delay the check until everything has been processed.
>>>
>>> This should guarantee the helper functions don't return prematurely
>>> while still returning proper values of either IRQ_HANDLED or IRQ_NONE.
>>>
>>> Reported-by: Liu Ying 
>>> Fixes: f3d9683346d6 ("drm/bridge: adv7511: Allow IRQ to share GPIO pins")
>>> Signed-off-by: Adam Ford 
>>> Tested-by: Liu Ying  # i.MX8MP EVK ADV7535 EDID 
>>> retrieval w/o IRQ
>>> ---
>>> V2:  Fix uninitialized cec_status
>>>  Cut back a little on error handling to return either IRQ_NONE or
>>>  IRQ_HANDLED.
>>>
>>> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511.h 
>>> b/drivers/gpu/drm/bridge/adv7511/adv7511.h
>>> index ea271f62b214..ec0b7f3d889c 100644
>>> --- a/drivers/gpu/drm/bridge/adv7511/adv7511.h
>>> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511.h
>>> @@ -401,7 +401,7 @@ struct adv7511 {
>>>
>>>  #ifdef CONFIG_DRM_I2C_ADV7511_CEC
>>>  int adv7511_cec_init(struct device *dev, struct adv7511 *adv7511);
>>> -void adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1);
>>> +int adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1);
>>>  #else
>>>  static inline int adv7511_cec_init(struct device *dev, struct adv7511 
>>> *adv7511)
>>>  {
>>> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c 
>>> b/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
>>> index 44451a9658a3..651fb1dde780 100644
>>> --- a/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
>>> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
>>> @@ -119,7 +119,7 @@ static void adv7511_cec_rx(struct adv7511 *adv7511, int 
>>> rx_buf)
>>>   cec_received_msg(adv7511->cec_adap, );
>>>  }
>>>
>>> -void adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1)
>>> +int adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1)
>>>  {
>>>   unsigned int offset = adv7511->info->reg_cec_offset;
>>>   const u32 irq_tx_mask = ADV7511_INT1_CEC_TX_READY |
>>> @@ -130,17 +130,21 @@ void adv7511_cec_irq_process(struct adv7511 *adv7511, 
>>> unsigned int irq1)
>>>   ADV7511_INT1_CEC_RX_READY3;
>>>   unsigned int rx_status;
>>>   int rx_order[3] = { -1, -1, -1 };
>>> - int i;
>>> + int i, ret = 0;
>>> + int irq_status = IRQ_NONE;
>>>
>>&g

Re: [PATCH V2] drm/bridge: adv7511: Fix Intermittent EDID failures

2024-06-17 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

Hi! Top-posting for once, to make this easily accessible to everyone.

Hmm, seem nobody took a look at below fix for a regression that seems to
be caused by f3d9683346d6b1 ("drm/bridge: adv7511: Allow IRQ to share
GPIO pins") [which went into v6.10-rc1].

Adam and Dimitry, what are your stances on this patch from Adam? I'm
asking, as you authored respectively committed the culprit?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 01.06.24 15:24, Adam Ford wrote:
> In the process of adding support for shared IRQ pins, a scenario
> was accidentally created where adv7511_irq_process returned
> prematurely causing the EDID to fail randomly.
> 
> Since the interrupt handler is broken up into two main helper functions,
> update both of them to treat the helper functions as IRQ handlers. These
> IRQ routines process their respective tasks as before, but if they
> determine that actual work was done, mark the respective IRQ status
> accordingly, and delay the check until everything has been processed.
> 
> This should guarantee the helper functions don't return prematurely
> while still returning proper values of either IRQ_HANDLED or IRQ_NONE.
> 
> Reported-by: Liu Ying 
> Fixes: f3d9683346d6 ("drm/bridge: adv7511: Allow IRQ to share GPIO pins")
> Signed-off-by: Adam Ford 
> Tested-by: Liu Ying  # i.MX8MP EVK ADV7535 EDID retrieval 
> w/o IRQ
> ---
> V2:  Fix uninitialized cec_status
>  Cut back a little on error handling to return either IRQ_NONE or
>  IRQ_HANDLED.
> 
> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511.h 
> b/drivers/gpu/drm/bridge/adv7511/adv7511.h
> index ea271f62b214..ec0b7f3d889c 100644
> --- a/drivers/gpu/drm/bridge/adv7511/adv7511.h
> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511.h
> @@ -401,7 +401,7 @@ struct adv7511 {
>  
>  #ifdef CONFIG_DRM_I2C_ADV7511_CEC
>  int adv7511_cec_init(struct device *dev, struct adv7511 *adv7511);
> -void adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1);
> +int adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1);
>  #else
>  static inline int adv7511_cec_init(struct device *dev, struct adv7511 
> *adv7511)
>  {
> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c 
> b/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
> index 44451a9658a3..651fb1dde780 100644
> --- a/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_cec.c
> @@ -119,7 +119,7 @@ static void adv7511_cec_rx(struct adv7511 *adv7511, int 
> rx_buf)
>   cec_received_msg(adv7511->cec_adap, );
>  }
>  
> -void adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1)
> +int adv7511_cec_irq_process(struct adv7511 *adv7511, unsigned int irq1)
>  {
>   unsigned int offset = adv7511->info->reg_cec_offset;
>   const u32 irq_tx_mask = ADV7511_INT1_CEC_TX_READY |
> @@ -130,17 +130,21 @@ void adv7511_cec_irq_process(struct adv7511 *adv7511, 
> unsigned int irq1)
>   ADV7511_INT1_CEC_RX_READY3;
>   unsigned int rx_status;
>   int rx_order[3] = { -1, -1, -1 };
> - int i;
> + int i, ret = 0;
> + int irq_status = IRQ_NONE;
>  
> - if (irq1 & irq_tx_mask)
> + if (irq1 & irq_tx_mask) {
>   adv_cec_tx_raw_status(adv7511, irq1);
> + irq_status = IRQ_HANDLED;
> + }
>  
>   if (!(irq1 & irq_rx_mask))
> - return;
> + return irq_status;
>  
> - if (regmap_read(adv7511->regmap_cec,
> - ADV7511_REG_CEC_RX_STATUS + offset, _status))
> - return;
> + ret = regmap_read(adv7511->regmap_cec,
> + ADV7511_REG_CEC_RX_STATUS + offset, _status);
> + if (ret < 0)
> + return irq_status;
>  
>   /*
>* ADV7511_REG_CEC_RX_STATUS[5:0] contains the reception order of RX
> @@ -172,6 +176,8 @@ void adv7511_cec_irq_process(struct adv7511 *adv7511, 
> unsigned int irq1)
>  
>   adv7511_cec_rx(adv7511, rx_buf);
>   }
> +
> + return IRQ_HANDLED;
>  }
>  
>  static int adv7511_cec_adap_enable(struct cec_adapter *adap, bool enable)
> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c 
> b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> index 66ccb61e2a66..c8d2c4a157b2 100644
> --- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> @@ -469,6 +469,8 @@ static int adv7511_irq_process(struct adv7511 *adv7511, 
> bool process_hpd)
>  {
>   unsigned int irq0, irq1;
>   int ret;
> + int cec_status = IRQ_NONE;
> + int irq_status = IRQ_NONE;
>  
>   ret = regmap_read(adv7511->regmap, 

Re: [PATCH v4 1/5] clk: sunxi-ng: common: Support minimum and maximum rate

2024-06-12 Thread Linux regression tracking (Thorsten Leemhuis)
On 23.05.24 20:58, Måns Rullgård wrote:
> Måns Rullgård  writes:
>> Frank Oltmanns  writes:
>>> 21.05.2024 15:43:10 Måns Rullgård :
 Frank Oltmanns  writes:

> The Allwinner SoC's typically have an upper and lower limit for their
> clocks' rates. Up until now, support for that has been implemented
> separately for each clock type.
>
> Implement that functionality in the sunxi-ng's common part making use of
> the CCF rate liming capabilities, so that it is available for all clock
> types.
>
> Suggested-by: Maxime Ripard 
> Signed-off-by: Frank Oltmanns 
> Cc: sta...@vger.kernel.org
> ---
> drivers/clk/sunxi-ng/ccu_common.c | 19 +++
> drivers/clk/sunxi-ng/ccu_common.h |  3 +++
> 2 files changed, 22 insertions(+)

 This just landed in 6.6 stable, and it broke HDMI output on an A20 based
 device, the clocks ending up all wrong as seen in this diff of
 /sys/kernel/debug/clk/clk_summary:
> [...]
> 
 Reverting this commit makes it work again.
>>> Thank you for your detailed report!
> [...]
> It turns out HDMI output is broken in v6.9 for a different reason.
> However, this commit (b914ec33b391 clk: sunxi-ng: common: Support
> minimum and maximum rate) requires two others as well in order not
> to break things on the A20:
> 
> cedb7dd193f6 drm/sun4i: hdmi: Convert encoder to atomic
> 9ca6bc246035 drm/sun4i: hdmi: Move mode_set into enable
> 
> With those two (the second depends on the first) cherry-picked on top of
> v6.6.31, the HDMI output is working again.  Likewise on v6.8.10.

They from what I can see are not yet in 6.6.y or on their way there (6.8
is EOL now). Did anyone ask Greg to pick this up? If not: Månsm could
you maybe do that? CCing him on a reply and asking is likely enough if
both changes apply cleanly.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot introduced: 547263745e15a0
#regzbot fix: drm/sun4i: hdmi: Move mode_set into enable
#regzbot poke


Re: [REGRESSION] QXL display malfunction

2024-06-12 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing a few more people and lists that get_maintainers pointed out for qxl]

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, from here it looks like this report that apparently is caused by
a change of yours that went into 6.10-rc1 (b33651a5c98dbd ("drm/qxl: Do
not pin buffer objects for vmap")) fell through the cracks. Or was
progress made to resolve this and I just missed this?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


On 03.06.24 04:29, Kaplan, David wrote:
>> -Original Message-
>> From: Kaplan, David
>> Sent: Sunday, June 2, 2024 9:25 PM
>> To: tzimmerm...@suse.de; dmitry.osipe...@collabora.com; Koenig,
>> Christian ; zach.ru...@broadcom.com
>> Cc: Petkov, Borislav ; regressi...@list.linux.dev
>> Subject: [REGRESSION] QXL display malfunction
>>
>> Hi,
>>
>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video and I've
>> observed the VM graphics often malfunction after boot, sometimes failing to
>> load the Ubuntu desktop or even immediately shutting the guest down.
>> When it does load, the guest dmesg log often contains errors like
>>
>> [4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65376256x16777216+0+0
>> [4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65376256x16777216+0+0
>> [4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head 1
>> wrong: 65335296x16777216+0+0
>> [5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find id in
>> release_idr
>>
>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for vmap"
>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>>
>> The full guest .config and guest XML can be provided if desired.  The guest
>> kernel has QXL support compiled in and the VM has
>>
>> 
>>   > heads="1" primary="yes"/>
>>   > function="0x0"/> 
>>
>> The host is Ubuntu 24.04 (stock) running QEMU version 8.2.2.  The VM is run
>> under virt-manager 4.1.0.  If other information would be helpful, just let me
>> know.
>>
>> Thanks --David Kaplan
> 
> Fixing emails...sorry
> 
> --David Kaplan
> 
> 



Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-06-07 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing the other amd drm maintainers]

On 05.06.24 14:04, Mikhail Gavrilov wrote:
> On Sun, May 26, 2024 at 7:06 PM Mikhail Gavrilov
>  wrote:
>>
>> Day before yesterday I replaced 7900XTX to 6900XT for got clear in
>> which kernel first time appeared warning message "DMA-API: amdgpu
>> :0f:00.0: cacheline tracking EEXIST, overlapping mappings aren't
>> supported".
>> The kernel 6.3 and older won't boot on a computer with Radeon 7900XTX.

Mikhail: are those details in any way relevant? Then in the future best
leave them out (or make things easier to follow), they make the bug
report confusing and sounds like this is just a bug, when it fact from
your bisection is sounds like this is a regression.

Anyway, @amd maintainers: is there a reason why this report did not get
at least a single reply? Or was there some progress somewhere and I just
missed it? Or would it be better if Mikhail would report this to
https://gitlab.freedesktop.org/drm/amd/-/issues/ ?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>> When I booted the system with 6900XT I saw a green flashing bar on top
>> of the screen when I typed commands in the gnome terminal which was
>> maximized on full screen.
>>
>> Demonstration: https://youtu.be/tTvwQ_5pRkk
>> For reproduction you need Radeon 6900XT GPU connected to 120Hz OLED TV by 
>> HDMI.
>>
>> I bisected the issue and the first commit which I found was 6d4279cb99ac.
>> commit 6d4279cb99ac4f51d10409501d29969f687ac8dc (HEAD)
>> Author: Rodrigo Siqueira 
>> Date:   Tue Mar 26 10:42:05 2024 -0600
>>
>> drm/amd/display: Drop legacy code
>>
>> This commit removes code that are not used by display anymore.
>>
>> Acked-by: Hamza Mahfooz 
>> Signed-off-by: Rodrigo Siqueira 
>> Signed-off-by: Alex Deucher 
>>
>>  drivers/gpu/drm/amd/display/dc/inc/hw/stream_encoder.h |  4 
>>  drivers/gpu/drm/amd/display/dc/inc/resource.h  |  7 ---
>>  drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c | 10 
>> --
>>  drivers/gpu/drm/amd/display/dc/resource/dcn21/dcn21_resource.c | 33
>> +
>>  4 files changed, 1 insertion(+), 53 deletions(-)
>>
>> Every time after bisecting I usually make sure that I found the right
>> commit and build the kernel with revert of the bad commit.
>> But this time I again observed an issue after running a kernel builded
>> without commit 6d4279cb99ac.
>> And I decided to find a second bad commit.
>> The second bad commit has been bc87d666c05.
>> commit bc87d666c05a13e6d4ae1ddce41fc43d2567b9a2 (HEAD)
>> Author: Rodrigo Siqueira 
>> Date:   Tue Mar 26 11:55:19 2024 -0600
>>
>> drm/amd/display: Add fallback configuration for set DRR in DCN10
>>
>> Set OTG/OPTC parameters to 0 if something goes wrong on DCN10.
>>
>> Acked-by: Hamza Mahfooz 
>> Signed-off-by: Rodrigo Siqueira 
>> Signed-off-by: Alex Deucher 
>>
>>  drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c | 15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> After reverting both these commits on top of 54f71b0369c9 the issue is gone.
>>
>> I also attach the build config.
>>
>> My hardware specs: https://linux-hardware.org/?probe=f25a873c5e
>>
>> Rodrigo or anyone else from the AMD team can you look please.
>>
> 
> Did anyone watch?
> 


Re: 6.10/regression/bisected commit c4cb23111103 causes sleeping function called from invalid context at kernel/locking/mutex.c:585

2024-05-28 Thread Linux regression tracking (Thorsten Leemhuis)
On 22.05.24 23:18, Chris Bainbridge wrote:
> On Tue, May 21, 2024 at 02:39:06PM +0500, Mikhail Gavrilov wrote:
>> Yesterday on the fresh kernel snapshot
>> I spotted a new bug message with follow stacktrace:
>> [4.307097] BUG: sleeping function called from invalid context at
>> kernel/locking/mutex.c:585
> I am also getting this error on every boot. Decoded stacktrace:

TWIMC & for the record: Boris also reported this; Vasant Hegde replied
and said a fix is in the works:

https://lore.kernel.org/all/898d356d-ec7d-41de-82d8-3ed4dc559...@amd.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot dup:
https://lore.kernel.org/all/cabxgcsn1z2gj99zsdhqwynptxbymrqhejdff8axxxoiz_0g...@mail.gmail.com/


Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-21 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Hmm, from here it looks like the patch now that it was reviewed more
that a week ago is still not even in -next. Is there a reason?

I know, we are in the merge window. But at the same time this is a fix
(that already lingered on the lists for way too long before it was
reviewed) for a regression in a somewhat recent kernel, so it in Linus
own words should be "expedited"[1].

Or are we again just missing a right person for the job in the CC?
Adding Dave and Sima just in case.

Ciao, Thorsten

[1]
https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/

On 12.05.24 18:11, Limonciello, Mario wrote:
> On 5/10/2024 4:24 AM, Jani Nikula wrote:
>> On Fri, 10 May 2024, "Lin, Wayne"  wrote:
>>>> -Original Message-
>>>> From: Limonciello, Mario 
>>>> Sent: Friday, May 10, 2024 3:18 AM
>>>> To: Linux regressions mailing list ;
>>>> Wentland, Harry
>>>> ; Lin, Wayne 
>>>> Cc: ly...@redhat.com; imre.d...@intel.com; Leon Weiß
>>>> >>> bochum.de>; sta...@vger.kernel.org; dri-devel@lists.freedesktop.org;
>>>> amd-
>>>> g...@lists.freedesktop.org; intel-...@lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/mst: Fix NULL pointer dereference at
>>>> drm_dp_add_payload_part2
>>>>
>>>> On 5/9/2024 07:43, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 18.04.24 21:43, Harry Wentland wrote:
>>>>>> On 2024-03-07 01:29, Wayne Lin wrote:
>>>>>>> [Why]
>>>>>>> Commit:
>>>>>>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement") accidently overwrite the commit
>>>>>>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in
>>>>>>> drm_dp_add_payload_part2") which cause regression.
>>>>>>>
>>>>>>> [How]
>>>>>>> Recover the original NULL fix and remove the unnecessary input
>>>>>>> parameter 'state' for drm_dp_add_payload_part2().
>>>>>>>
>>>>>>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
>>>>>>> allocation/removement")
>>>>>>> Reported-by: Leon Weiß 
>>>>>>> Link:
>>>>>>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.c
>>>>>>> a...@ruhr-uni-bochum.de/
>>>>>>> Cc: ly...@redhat.com
>>>>>>> Cc: imre.d...@intel.com
>>>>>>> Cc: sta...@vger.kernel.org
>>>>>>> Cc: regressi...@lists.linux.dev
>>>>>>> Signed-off-by: Wayne Lin 
>>>>>>
>>>>>> I haven't been deep in MST code in a while but this all looks pretty
>>>>>> straightforward and good.
>>>>>>
>>>>>> Reviewed-by: Harry Wentland 
>>>>>
>>>>> Hmmm, that was three weeks ago, but it seems since then nothing
>>>>> happened to fix the linked regression through this or some other
>>>>> patch. Is there a reason? The build failure report from the CI maybe?
>>>>
>>>> It touches files outside of amd but only has an ack from AMD.  I
>>>> think we
>>>> /probably/ want an ack from i915 and nouveau to take it through.
>>>
>>> Thanks, Mario!
>>>
>>> Hi Thorsten,
>>> Yeah, like what Mario said. Would also like to have ack from i915 and
>>> nouveau.
>>
>> It usually works better if you Cc the folks you want an ack from! ;)
>>
>> Acked-by: Jani Nikula 
>>
> 
> Thanks! Can someone with commit permissions take this to drm-misc?
> 
> 
> 


Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-09 Thread Linux regression tracking (Thorsten Leemhuis)
On 18.04.24 21:43, Harry Wentland wrote:
> On 2024-03-07 01:29, Wayne Lin wrote:
>> [Why]
>> Commit:
>> - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> accidently overwrite the commit
>> - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in 
>> drm_dp_add_payload_part2")
>> which cause regression.
>>
>> [How]
>> Recover the original NULL fix and remove the unnecessary input parameter 
>> 'state' for
>> drm_dp_add_payload_part2().
>>
>> Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload 
>> allocation/removement")
>> Reported-by: Leon Weiß 
>> Link: 
>> https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.ca...@ruhr-uni-bochum.de/
>> Cc: ly...@redhat.com
>> Cc: imre.d...@intel.com
>> Cc: sta...@vger.kernel.org
>> Cc: regressi...@lists.linux.dev
>> Signed-off-by: Wayne Lin 
> 
> I haven't been deep in MST code in a while but this all looks
> pretty straightforward and good.
> 
> Reviewed-by: Harry Wentland 

Hmmm, that was three weeks ago, but it seems since then nothing happened
to fix the linked regression through this or some other patch. Is there
a reason? The build failure report from the CI maybe?

Wayne Lin, do you know what's up?

Ciao, Thorsten

>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +-
>>  drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +---
>>  drivers/gpu/drm/i915/display/intel_dp_mst.c   | 2 +-
>>  drivers/gpu/drm/nouveau/dispnv50/disp.c   | 2 +-
>>  include/drm/display/drm_dp_mst_helper.h   | 1 -
>>  5 files changed, 4 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c 
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> index c27063305a13..2c36f3d00ca2 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
>> @@ -363,7 +363,7 @@ void dm_helpers_dp_mst_send_payload_allocation(
>>  mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state);
>>  new_payload = drm_atomic_get_mst_payload_state(mst_state, 
>> aconnector->mst_output_port);
>>  
>> -ret = drm_dp_add_payload_part2(mst_mgr, mst_state->base.state, 
>> new_payload);
>> +ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
>>  
>>  if (ret) {
>>  amdgpu_dm_set_mst_status(>mst_status,
>> diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
>> b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> index 03d528209426..95fd18f24e94 100644
>> --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
>> @@ -3421,7 +3421,6 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>  /**
>>   * drm_dp_add_payload_part2() - Execute payload update part 2
>>   * @mgr: Manager to use.
>> - * @state: The global atomic state
>>   * @payload: The payload to update
>>   *
>>   * If @payload was successfully assigned a starting time slot by 
>> drm_dp_add_payload_part1(), this
>> @@ -3430,14 +3429,13 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part2);
>>   * Returns: 0 on success, negative error code on failure.
>>   */
>>  int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr,
>> - struct drm_atomic_state *state,
>>   struct drm_dp_mst_atomic_payload *payload)
>>  {
>>  int ret = 0;
>>  
>>  /* Skip failed payloads */
>>  if (payload->payload_allocation_status != 
>> DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) {
>> -drm_dbg_kms(state->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>> +drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s 
>> failed, skipping part 2\n",
>>  payload->port->connector->name);
>>  return -EIO;
>>  }
>> diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
>> b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> index 53aec023ce92..2fba66aec038 100644
>> --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
>> @@ -1160,7 +1160,7 @@ static void intel_mst_enable_dp(struct 
>> intel_atomic_state *state,
>>  if (first_mst_stream)
>>  intel_ddi_wait_for_fec_status(encoder, pipe_config, true);
>>  
>> -drm_dp_add_payload_part2(_dp->mst_mgr, >base,
>> +drm_dp_add_payload_part2(_dp->mst_mgr,
>>   drm_atomic_get_mst_payload_state(mst_state, 
>> connector->port));
>>  
>>  if (DISPLAY_VER(dev_priv) >= 12)
>> diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c 
>> b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> index 0c3d88ad0b0e..88728a0b2c25 100644
>> --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
>> @@ -915,7 +915,7 @@ nv50_msto_cleanup(struct drm_atomic_state *state,
>>  msto->disabled = false;
>>  

Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]

2024-05-08 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.05.24 14:35, Anders Blomdell wrote:
> On 2024-05-07 07:04, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.05.24 16:30, David Wang wrote:
>>>> On 30.04.24 08:13, David Wang wrote:
>>
>>>> And confirmed that the warning is caused by
>>>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>>>
>>> The kernel warning still shows up in 6.9.0-rc7.
>>> (I think 4 high load processes on a 2-Core VM could easily trigger
>>> the kernel warning.)
>>
>> Thx for the report. Linus just reverted the commit 07ed11afb68 you
>> mentioned in your initial mail (I put that quote in again, see above):
>>
>> 3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
>> https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10
>>
>> So this hopefully should be history now.
>>
> Since this affects the 6.8 series (6.8.7 and onwards), I made a CC to
> sta...@vger.kernel.org

Ohh, good idea, I thought Linus had added a stable tag, but that is not
the case. Adding Greg as well and making things explicit:

@Greg: you might want to add 3628e0383dd349 ("Reapply "drm/qxl: simplify
qxl_fence_wait"") to all branches that received 07ed11afb68d94 ("Revert
"drm/qxl: simplify qxl_fence_wait"") (which afaics went into v6.8.7,
v6.6.28, v6.1.87, and v5.15.156).

Ciao, Thorsten


Re: [Regression] 6.9.0: WARNING: workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl]

2024-05-06 Thread Linux regression tracking (Thorsten Leemhuis)



On 06.05.24 16:30, David Wang wrote:
>> On 30.04.24 08:13, David Wang wrote:

>> And confirmed that the warning is caused by
>> 07ed11afb68d94eadd4ffc082b97c2331307c5ea and reverting it can fix.
>
> The kernel warning still shows up in 6.9.0-rc7.
> (I think 4 high load processes on a 2-Core VM could easily trigger the kernel 
> warning.)

Thx for the report. Linus just reverted the commit 07ed11afb68 you
mentioned in your initial mail (I put that quote in again, see above):

3628e0383dd349 ("Reapply "drm/qxl: simplify qxl_fence_wait"")
https://git.kernel.org/torvalds/c/3628e0383dd349f02f882e612ab6184e4bb3dc10

So this hopefully should be history now.

Ciao, Thorsten


Re: nouveau: r535.c:1266:3: error: label at end of compound statement default: with gcc-8

2024-04-29 Thread Linux regression tracking (Thorsten Leemhuis)



On 29.04.24 17:06, Naresh Kamboju wrote:
> Following build warnings / errors noticed on Linux next-20240429 tag on the
> arm64, arm and riscv with gcc-8 and gcc-13 builds pass.
> 
> Reported-by: Linux Kernel Functional Testing 
> 
> Commit id:
>  b58a0bc904ff nouveau: add command-line GSP-RM registry support
> 
> Buids:
> --
>   gcc-8-arm64-defconfig - Fail
>   gcc-8-arm-defconfig - Fail
>   gcc-8-riscv-defconfig - Fail
> 
> Build log:
> 
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c: In function 'build_registry':
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c:1266:3: error: label at
> end of compound statement
>default:
>^~~
> make[7]: *** [scripts/Makefile.build:244:
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.o] Error 1

TWIMC, there is another report about this in this thread (sadly some of
its post did not make it to lore):

https://lore.kernel.org/all/162ef3c0-1d7b-4220-a21f-b0008657f...@redhat.com/

Ciao, Thorsten

> metadata:
>   git_describe: next-20240429
>   git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
>   git_short_log: b0a2c79c6f35 ("Add linux-next specific files for 20240429")
>   arch: arm64, arm, riscv
>   toolchain: gcc-8
> 
> Steps to reproduce:
> 
> # tuxmake --runtime podman --target-arch arm64 --toolchain gcc-8
> --kconfig defconfig
> 
> Links:
>  - 
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2flcoOuqVJfhTvX4AOYsWMd5hqe/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23704376/suite/build/test/gcc-8-defconfig/history/
>  - 
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240429/testrun/23705756/suite/build/test/gcc-8-defconfig/details/
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org
> 
> 


Re: [REGRESSION] external monitor+Dell dock in 6.8

2024-04-02 Thread Linux regression tracking (Thorsten Leemhuis)
[Adding a few folks and list while dropping the stable list, as this is
unrelated to it]

On 31.03.24 07:59, Andrei Gaponenko wrote:
> 
> I noticed a regression with the mailine kernel pre-compiled by EPEL.
> I have just tried linux-6.9-rc1.tar.gz from kernel.org, and it still
> misbehaves.
> 
> The default setup: a laptop is connected to a dock, Dell WD22TB4, via
> a USB-C cable.  The dock is connected to an external monitor via a
> Display Port cable.  With a "good" kernel everything works.  With a
> "broken" kernel, the external monitor is still correctly identified by
> the system, and is shown as enabled in plasma systemsettings. The
> system also behaves like the monitor is working, for example, one can
> move the mouse pointer off the laptop screen.  However the external
> monitor screen stays black, and it eventually goes to sleep.

Just a quick heads up to ensure people are aware of it:

Imre Deak, turns out this is caused by a patch of yours: 55eaef16417448
("drm/i915/dp_mst: Handle the Synaptics HBlank expansion quirk"). Andrei
Gaponenko meanwhile filed a ticket about it here:

https://gitlab.freedesktop.org/drm/intel/-/issues/10637

Ciao, Thorsten

> Everything worked with EPEL mainline kernels up to and including
> kernel-ml-6.7.9-1.el9.elrepo.x86_64
> 
> The breakage is observed in
> 
> kernel-ml-6.8.1-1.el9.elrepo.x86_64
> kernel-ml-6.8.2-1.el9.elrepo.x86_64
> linux-6.9-rc1.tar.gz from kernel.org (with olddefconfig)
> 
> Other tests: using an HDMI cable instead of the Display Port cable
> between the monitor and the dock does not change things, black screen
> with the newer kernels.
> 
> Using a small HDMI-to-USB-C adapter instead of the dock results in a
> working system, even with the newer kernels.  So the breakage appears
> to be specific to the Dell WD22TB4 dock.
> 
> Operating System: AlmaLinux 9.3 (Shamrock Pampas Cat)
> 
> uname -mi: x86_64 x86_64
> 
> Laptop: Dell Precision 5470/02RK6V
> 
> lsusb |grep dock
> Bus 003 Device 007: ID 413c:b06e Dell Computer Corp. Dell dock
> Bus 003 Device 008: ID 413c:b06f Dell Computer Corp. Dell dock
> Bus 003 Device 006: ID 0bda:5413 Realtek Semiconductor Corp. Dell dock
> Bus 003 Device 005: ID 0bda:5487 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 004: ID 0bda:0413 Realtek Semiconductor Corp. Dell dock
> Bus 002 Device 003: ID 0bda:0487 Realtek Semiconductor Corp. Dell dock
> 
> dmesg and kernel config are attached to 
> https://bugzilla.kernel.org/show_bug.cgi?id=218663
> 
> #regzbot introduced: v6.7.9..v6.8.1

P.S.:

#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=218663
#regzbot duplicate: https://gitlab.freedesktop.org/drm/intel/-/issues/10637
#regzbot title: drm/i915/dp_mst: external monitor on Dell dock broke


Re: [PATCH 1/1] drm/qxl: fixes qxl_fence_wait

2024-03-20 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.03.24 02:08, Alex Constantino wrote:
> Fix OOM scenario by doing multiple notifications to the OOM handler through
> a busy wait logic.
> Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
> result in a '[TTM] Buffer eviction failed' exception whenever it reached a
> timeout.
> 
> Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
> Link: 
> https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b...@leemhuis.info
> Reported-by: Timo Lindfors 
> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
> Signed-off-by: Alex Constantino 
> ---
>  drivers/gpu/drm/qxl/qxl_release.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)

Hey Dave and Gerd as well as Thomas, Maarten and Maxime (the latter two
I just added to the CC), it seems to me this regression fix did not
maybe any progress since it was posted. Did I miss something, is it just
"we are busy with the merge window", or is there some other a reason?
Just wondering, I just saw someone on a Fedora IRC channel complaining
about the regression, that's why I'm asking. Would be really good to
finally get this resolved...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
> b/drivers/gpu/drm/qxl/qxl_release.c
> index 368d26da0d6a..51c22e7f9647 100644
> --- a/drivers/gpu/drm/qxl/qxl_release.c
> +++ b/drivers/gpu/drm/qxl/qxl_release.c
> @@ -20,8 +20,6 @@
>   * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>   */
>  
> -#include 
> -
>  #include 
>  
>  #include "qxl_drv.h"
> @@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool 
> intr,
>  {
>   struct qxl_device *qdev;
>   unsigned long cur, end = jiffies + timeout;
> + signed long iterations = 1;
> + signed long timeout_fraction = timeout;
>  
>   qdev = container_of(fence->lock, struct qxl_device, release_lock);
>  
> - if (!wait_event_timeout(qdev->release_event,
> + // using HZ as a factor since it is used in ttm_bo_wait_ctx too
> + if (timeout_fraction > HZ) {
> + iterations = timeout_fraction / HZ;
> + timeout_fraction = HZ;
> + }
> + for (int i = 0; i < iterations; i++) {
> + if (wait_event_timeout(
> + qdev->release_event,
>   (dma_fence_is_signaled(fence) ||
> -  (qxl_io_notify_oom(qdev), 0)),
> - timeout))
> - return 0;
> + (qxl_io_notify_oom(qdev), 0)),
> + timeout_fraction))
> + break;
> + }
>  
>   cur = jiffies;
>   if (time_after(cur, end))


Re: [PATCH] Fix divide-by-zero on DP unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)
On 11.03.24 17:09, Imre Deak wrote:
> On Sat, Feb 10, 2024 at 09:24:59PM +, Chris Bainbridge wrote:
> Sorry for the delay.

Happens, thx for looking onto this!

>> The following trace occurs when using nouveau and unplugging a DP MST
>> adaptor:
> [...] 
>> +if (bpp_x16 == 0)
>> +return 0;
> 
> Could you please move the check to the beginnig of the function and add
> a debug message in case bpp_x16 is 0?
> 
> It looks odd that a driver calls this function with a 0 bpp_x16, and
> ideally it should be fixed in the driver. However as it's a regression
> and we don't have a better idea now:
> 
> Acked-by: Imre Deak 

Chris: as this went into 6.8, please consider adding a stable-tag to
ensure Greg picks this up.

Ciao, Thorsten



Re: [REGRESSION] Divide-by-zero on DisplayPort MST unplug with nouveau

2024-03-11 Thread Linux regression tracking (Thorsten Leemhuis)
On 07.03.24 18:58, Chris Bainbridge wrote:
> - Forwarded message from Chris Bainbridge  
> -
> 
> Date: Sat, 10 Feb 2024 21:24:59 +

Hmm, it looks like nobody is looking into this regression. Is there a
good reason?

Imre, or did you maybe just miss that Chris' regression seems to be
caused by a commit of yours? He initally proposed a fix (the forwarded
mail that is quoted here) more a month ago already here:
https://lore.kernel.org/all/ZcfpqwnkSoiJxeT9@debian.local/

Chris recently filed a ticket, too:
https://gitlab.freedesktop.org/drm/misc/kernel/-/issues/36

Mostly silence there as well. :-/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S: Chris, sorry, I had missed that you initially proposed the fix a
month ago; if I had noticed this earlier I had sent a mail like this one
earlier.
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> From: Chris Bainbridge 
> To: dri-devel@lists.freedesktop.org
> Cc: ly...@redhat.com, ville.syrj...@linux.intel.com, 
> stanislav.lisovs...@intel.com,
>   mrip...@kernel.org, imre.d...@intel.com
> Subject: [PATCH] Fix divide-by-zero on DP unplug with nouveau
> 
> The following trace occurs when using nouveau and unplugging a DP MST
> adaptor:
>>  divide error:  [#1] PREEMPT SMP PTI
>  CPU: 7 PID: 2962 Comm: Xorg Not tainted 6.8.0-rc3+ #744
>  Hardware name: Razer Blade/DANA_MB, BIOS 01.01 08/31/2018
>  RIP: 0010:drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>  Code: c6 b8 01 00 00 00 75 61 01 c6 41 0f af f3 41 0f af f1 c1 e1 04 48 63 
> c7 31 d2 89 ff 48 8b 5d f8 c9 48 0f af f1 48 8d 44 06 ff <48> f7 f7 31 d2 31 
> c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31
>  RSP: 0018:b2c5c211fa30 EFLAGS: 00010206
>  RAX:  RBX:  RCX: 00f59b00
>  RDX:  RSI:  RDI: 
>  RBP: b2c5c211fa48 R08: 0001 R09: 0020
>  R10: 0004 R11:  R12: 00023b4a
>  R13: 91d37d165800 R14: 91d36fac6d80 R15: 91d34a764010
>  FS:  7f4a1ca3fa80() GS:91d6edbc() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 559491d49000 CR3: 00011d180002 CR4: 003706f0
>  Call Trace:
>   
>   ? show_regs+0x6d/0x80
>   ? die+0x37/0xa0
>   ? do_trap+0xd4/0xf0
>   ? do_error_trap+0x71/0xb0
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? exc_divide_error+0x3a/0x70
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? asm_exc_divide_error+0x1b/0x20
>   ? drm_dp_bw_overhead+0xb4/0x110 [drm_display_helper]
>   ? drm_dp_calc_pbn_mode+0x2e/0x70 [drm_display_helper]
>   nv50_msto_atomic_check+0xda/0x120 [nouveau]
>   drm_atomic_helper_check_modeset+0xa87/0xdf0 [drm_kms_helper]
>   drm_atomic_helper_check+0x19/0xa0 [drm_kms_helper]
>   nv50_disp_atomic_check+0x13f/0x2f0 [nouveau]
>   drm_atomic_check_only+0x668/0xb20 [drm]
>   ? drm_connector_list_iter_next+0x86/0xc0 [drm]
>   drm_atomic_commit+0x58/0xd0 [drm]
>   ? __pfx___drm_printfn_info+0x10/0x10 [drm]
>   drm_atomic_connector_commit_dpms+0xd7/0x100 [drm]
>   drm_mode_obj_set_property_ioctl+0x1c5/0x450 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   drm_connector_property_set_ioctl+0x3b/0x60 [drm]
>   drm_ioctl_kernel+0xb9/0x120 [drm]
>   drm_ioctl+0x2d0/0x550 [drm]
>   ? __pfx_drm_connector_property_set_ioctl+0x10/0x10 [drm]
>   nouveau_drm_ioctl+0x61/0xc0 [nouveau]
>   __x64_sys_ioctl+0xa0/0xf0
>   do_syscall_64+0x76/0x140
>   ? do_syscall_64+0x85/0x140
>   ? do_syscall_64+0x85/0x140
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
>  RIP: 0033:0x7f4a1cd1a94f
>  Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 
> 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 
> ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
>  RSP: 002b:7ffd2f1df520 EFLAGS: 0246 ORIG_RAX: 0010
>  RAX: ffda RBX: 7ffd2f1df5b0 RCX: 7f4a1cd1a94f
>  RDX: 7ffd2f1df5b0 RSI: c01064ab RDI: 000f
>  RBP: c01064ab R08: 56347932deb8 R09: 56347a7d99c0
>  R10:  R11: 0246 R12: 56347938a220
>  R13: 000f R14: 563479d9f3f0 R15: 
>   
>  Modules linked in: rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat 
> nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user 
> xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp 
> llc ccm cmac algif_hash overlay algif_skcipher af_alg bnep binfmt_misc 
> snd_sof_pci_intel_cnl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci 
> snd_sof_xtensa_dsp snd_sof_intel_hda snd_sof snd_sof_utils 
> snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress 
> snd_sof_intel_hda_mlink 

Re: [pull] drm/msm: drm-msm-next-2024-02-29 for v6.9

2024-03-05 Thread Linux regression tracking (Thorsten Leemhuis)
On 29.02.24 20:04, Rob Clark wrote:
> 
> This is the main pull for v6.9, description below.
> 
> [...]
>
> GPU:
> - fix sc7180 UBWC config

Why was that queued for 6.9? That is a fix for a 6.8 regression that for
untrained eyes like mine does not look overly dangerous (but of course I
might be wrong with that).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [PATCH] drm/nouveau: keep DMA buffers required for suspend/resume

2024-03-03 Thread Linux regression tracking (Thorsten Leemhuis)
[adding a bunch of list and people as well as Timur Tabi, who authored
the culprit]

Sid Pranjale, thx for the report. FWIW, I'm just replying to add this to
the regression tracking to ensure it does not fall through the cracks.
Nevertheless let me mention two things while at it:

On 29.02.24 18:58, Sid Pranjale wrote:
> Nouveau deallocates a few buffers post GPU init which are required for GPU 
> suspend/resume to function correctly.
> This is likely not as big an issue on systems where the NVGPU is the only 
> GPU, but on multi-GPU set ups it leads to a regression where the kernel 
> module errors and results in a system-wide rendering freeze.

These lines are too long, see
Documentation/process/submitting-patches.rst for details.

> This commit addresses that regression by moving the two buffers required for 
> suspend and resume to be deallocated at driver unload instead of post init.
> 
> Fixes: 042b5f8 ("drm/nouveau: fix several DMA buffer leaks")

And that should be:

Fixes:  042b5f83841fbf ("drm/nouveau: fix several DMA buffer leaks")

> Signed-off-by: Sid Pranjale 
> ---
>  drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index a64c81385..a73a5b589 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
>   /* Release the DMA buffers that were needed only for boot and init */
>   nvkm_gsp_mem_dtor(gsp, >boot.fw);
>   nvkm_gsp_mem_dtor(gsp, >libos);
> - nvkm_gsp_mem_dtor(gsp, >rmargs);
> - nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>  
>   return ret;
>  }
> @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
>  
>   r535_gsp_dtor_fws(gsp);
>  
> + nvkm_gsp_mem_dtor(gsp, >rmargs);
> + nvkm_gsp_mem_dtor(gsp, >wpr_meta);
>   nvkm_gsp_mem_dtor(gsp, >shm.mem);
>   nvkm_gsp_mem_dtor(gsp, >loginit);
>   nvkm_gsp_mem_dtor(gsp, >logintr);

To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced 042b5f83841fbf
#regzbot title drm/nouveau: rendering freezes with multi-GPU setup
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: drm/msm: DisplayPort regressions in 6.8-rc1

2024-02-14 Thread Linux regression tracking (Thorsten Leemhuis)
On 13.02.24 19:00, Abhinav Kumar wrote:
> 
> Thanks for the report.
> 
> I do agree that pm runtime eDP driver got merged that time but I think
> the issue is either a combination of that along with DRM aux bridge
> https://patchwork.freedesktop.org/series/122584/ OR just the latter as
> even that went in around the same time.

In that case allow me a stupid question from the cheap seats:

Is there anything affected users can do to help getting us closer to the
real problem? Like testing a specific commit or two before or after the
merge of one of those features for example? That might help to rule out
a few things.

Ciao, Thorsten

> Thats why perhaps this issue was not seen with the chromebooks we tested
> on as they do not use pmic_glink (aux bridge).
> 
> So we will need to debug this on sc8280xp specifically or an equivalent
> device which uses aux bridge.
> 
> On 2/13/2024 3:42 AM, Johan Hovold wrote:
>> Hi,
>>
>> Since 6.8-rc1 the internal eDP display on the Lenovo ThinkPad X13s does
>> not always show up on boot.
>>
>> The logs indicate problems with the runtime PM and eDP rework that went
>> into 6.8-rc1:
>>
>> [    6.006236] Console: switching to colour dummy device 80x25
>> [    6.007542] [drm:dpu_kms_hw_init:1048] dpu hardware
>> revision:0x8000
>> [    6.007872] [drm:drm_bridge_attach [drm]] *ERROR* failed to
>> attach bridge /soc@0/phy@88eb000 to encoder TMDS-31: -16
>> [    6.007934] [drm:dp_bridge_init [msm]] *ERROR* failed to attach
>> panel bridge: -16
>> [    6.007983] msm_dpu ae01000.display-controller:
>> [drm:msm_dp_modeset_init [msm]] *ERROR* failed to create dp bridge: -16
>> [    6.008030] [drm:_dpu_kms_initialize_displayport:588] [dpu
>> error]modeset_init failed for DP, rc = -16
>> [    6.008050] [drm:_dpu_kms_setup_displays:681] [dpu
>> error]initialize_DP failed, rc = -16
>> [    6.008068] [drm:dpu_kms_hw_init:1153] [dpu error]modeset init
>> failed: -16
>> [    6.008388] msm_dpu ae01000.display-controller:
>> [drm:msm_drm_kms_init [msm]] *ERROR* kms hw init failed: -16
>> 
>> and this can also manifest itself as a NULL-pointer dereference:
>>
>> [    7.339447] Unable to handle kernel NULL pointer dereference at
>> virtual address 
>> 
>> [    7.643705] pc : drm_bridge_attach+0x70/0x1a8 [drm]
>> [    7.686415] lr : drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
>> 
>> [    7.769039] Call trace:
>> [    7.771564]  drm_bridge_attach+0x70/0x1a8 [drm]
>> [    7.776234]  drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
>> [    7.781782]  drm_bridge_attach+0x80/0x1a8 [drm]
>> [    7.786454]  dp_bridge_init+0xa8/0x15c [msm]
>> [    7.790856]  msm_dp_modeset_init+0x28/0xc4 [msm]
>> [    7.795617]  _dpu_kms_drm_obj_init+0x19c/0x680 [msm]
>> [    7.800731]  dpu_kms_hw_init+0x348/0x4c4 [msm]
>> [    7.805306]  msm_drm_kms_init+0x84/0x324 [msm]
>> [    7.809891]  msm_drm_bind+0x1d8/0x3a8 [msm]
>> [    7.814196]  try_to_bring_up_aggregate_device+0x1f0/0x2f8
>> [    7.819747]  __component_add+0xa4/0x18c
>> [    7.823703]  component_add+0x14/0x20
>> [    7.827389]  dp_display_probe+0x47c/0x568 [msm]
>> [    7.832052]  platform_probe+0x68/0xd8
>>
>> Users have also reported random crashes at boot since 6.8-rc1, and I've
>> been able to trigger hard crashes twice when testing an external display
>> (USB-C/DP), which may also be related to the DP regressions.
>>
>> I've opened an issue here:
>>
>> https://gitlab.freedesktop.org/drm/msm/-/issues/51
>>
>> but I also want Thorsten's help to track this so that it gets fixed
>> before 6.8 is released.
>>
>> #regzbot introduced: v6.7..v6.8-rc1
>>
>> The following series is likely the culprit:
>>
>> 
>> https://lore.kernel.org/all/1701472789-25951-1-git-send-email-quic_khs...@quicinc.com/
>>
>> Johan
> 
> 


Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-28 Thread Linux regression tracking (Thorsten Leemhuis)
On 27.01.24 14:14, Salvatore Bonaccorso wrote:
>
> In Debian (https://bugs.debian.org/1061449) we got the following
> quotred report:
> 
> On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
>>
>> Giving a try to 6.7, here is a message extracted from dmesg:
>> [4.177226] [ cut here ]
>> [4.177227] WARNING: CPU: 6 PID: 248 at
>> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
>> construct_phy+0xb26/0xd60 [amdgpu]
> [...]

Not my area of expertise, but looks a lot like a duplicate of
https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835

Mario (now CCed) already prepared a patch for that issue that seems to work.

HTH, Ciao, Thorsten


Re: Bug#1054514: linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently

2023-12-06 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Gerd, it seems this regression[1] fell through the cracks. Could you
please take a look? Or is there a good reason why this can't be
addressed? Or was it dealt with and I just missed it?

[1] apparently caused by 5a838e5d5825c8 ("drm/qxl: simplify
qxl_fence_wait") [v5.13-rc1] from Gerd; for details see
https://lore.kernel.org/regressions/ztgydqrlk6wx_...@eldamar.lan/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 24.10.23 23:39, Timo Lindfors wrote:
> Hi,
> 
> On Tue, 24 Oct 2023, Salvatore Bonaccorso wrote:
>> Thanks for the excelent constructed report! I think it's best to
>> forward this directly to upstream including the people for the
>> bisected commit to get some idea.
> 
> Thanks for the quick reply!
> 
>> Can you reproduce the issue with 6.5.8-1 in unstable as well?
> 
> Unfortunately yes:
> 
> ansible@target:~$ uname -r
> 6.5.0-3-amd64
> ansible@target:~$ time sudo ./reproduce.bash
> Wed 25 Oct 2023 12:27:00 AM EEST starting round 1
> Wed 25 Oct 2023 12:27:24 AM EEST starting round 2
> Wed 25 Oct 2023 12:27:48 AM EEST starting round 3
> bug was reproduced after 3 tries
> 
> real    0m48.838s
> user    0m1.115s
> sys 0m45.530s
> 
> I also tested upstream tag v6.6-rc6:
> 
> ...
> + detected_version=6.6.0-rc6
> + '[' 6.6.0-rc6 '!=' 6.6.0-rc6 ']'
> + exec ssh target sudo ./reproduce.bash
> Wed 25 Oct 2023 12:37:16 AM EEST starting round 1
> Wed 25 Oct 2023 12:37:42 AM EEST starting round 2
> Wed 25 Oct 2023 12:38:10 AM EEST starting round 3
> Wed 25 Oct 2023 12:38:36 AM EEST starting round 4
> Wed 25 Oct 2023 12:39:01 AM EEST starting round 5
> Wed 25 Oct 2023 12:39:27 AM EEST starting round 6
> bug was reproduced after 6 tries
> 
> 
> For completeness, here is also the grub_set_default_version.bash script
> that I had to write to automate this (maybe these could be in debian
> wiki?):
> 
> #!/bin/bash
> set -x
> 
> version="$1"
> 
> idx=$(expr $(grep "menuentry " /boot/grub/grub.cfg | sed 1d |grep -n
> "'Debian GNU/Linux, with Linux $version'"|cut -d: -f1) - 1)
> exec sudo grub-set-default "1>$idx"
> 
> 
> 
> -Timo
> 
> 
> 


Re: [PATCH v2 2/2] drm/msm/dp: attach the DP subconnector property

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 21.11.23 19:50, Abhinav Kumar wrote:
> On 11/21/2023 9:57 AM, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 15.11.23 19:06, Abhinav Kumar wrote:
>>> On 11/15/2023 12:06 AM, Johan Hovold wrote:
>>>> On Wed, Oct 25, 2023 at 12:23:10PM +0300, Dmitry Baryshkov wrote:
>>>>> While developing and testing the commit bfcc3d8f94f4 ("drm/msm/dp:
>>>>> support setting the DP subconnector type") I had the patch [1] in my
>>>>> tree. I haven't noticed that it was a dependency for the commit in
>>>>> question. Mea culpa.
>>>>
>>>> This also broke boot on the Lenovo ThinkPad X13s.
>>>>
>>>> Would be nice to get this fixed ASAP so that further people don't have
>>>> to debug this known regression.
>>>
>>> I will queue this patch for -fixes rightaway.
>>
>> Thx. I noticed that this fix is still not in -next. I then investigated
>> and I found it was applied on Thursday last week here:
>> https://gitlab.freedesktop.org/drm/msm/-/commits/msm-fixes?ref_type=heads
>>
>> Makes me wonder: when will that patch go to a branch that is included in
>> -next? And when will it move on towards mainline?
> 
> This has been included in a pull request for 6.7-rc3 to the DRM tree and
> shall make it to -next from there.

Ahh, great, thx, I was slowly getting worried.

Ciao, Thorsten

>>>>> Since the patch has not landed yet (and even was not reviewed)
>>>>> and since one of the bridges erroneously uses USB connector type
>>>>> instead
>>>>> of DP, attach the property directly from the MSM DP driver.
>>>>>
>>>>> This fixes the following oops on DP HPD event:
>>>>>
>>>>>    drm_object_property_set_value
>>>>> (drivers/gpu/drm/drm_mode_object.c:288)
>>>>>    dp_display_process_hpd_high
>>>>> (drivers/gpu/drm/msm/dp/dp_display.c:402)
>>>>>    dp_hpd_plug_handle.isra.0 (drivers/gpu/drm/msm/dp/dp_display.c:604)
>>>>>    hpd_event_thread (drivers/gpu/drm/msm/dp/dp_display.c:1110)
>>>>>    kthread (kernel/kthread.c:388)
>>>>>    ret_from_fork (arch/arm64/kernel/entry.S:858)
>>>>
>>>> This only says where the oops happened, it doesn't necessarily in
>>>> itself
>>>> indicate an oops at all or that in this case it's a NULL pointer
>>>> dereference.
>>>>
>>>> On the X13s I'm seeing the NULL deref in a different path during boot,
>>>> and when this happens after a deferred probe (due to the panel lookup
>>>> mess) it hangs the machine, which makes it a bit of a pain to debug:
>>>>
>>>>  Unable to handle kernel NULL pointer dereference at virtual
>>>> address 0060
>>>>  ...
>>>>  CPU: 4 PID: 57 Comm: kworker/u16:1 Not tainted 6.7.0-rc1 #4
>>>>  Hardware name: Qualcomm QRD, BIOS
>>>> 6.0.220110.BOOT.MXF.1.1-00470-MAKENA-1 01/10/2022
>>>>  ...
>>>>  Call trace:
>>>>   drm_object_property_set_value+0x0/0x88 [drm]
>>>>   dp_display_process_hpd_high+0xa0/0x14c [msm]
>>>>   dp_hpd_plug_handle.constprop.0.isra.0+0x90/0x110 [msm]
>>>>   dp_bridge_atomic_enable+0x184/0x21c [msm]
>>>>   edp_bridge_atomic_enable+0x60/0x94 [msm]
>>>>   drm_atomic_bridge_chain_enable+0x54/0xc8 [drm]
>>>>   drm_atomic_helper_commit_modeset_enables+0x194/0x26c
>>>> [drm_kms_helper]
>>>>   msm_atomic_commit_tail+0x204/0x804 [msm]
>>>>   commit_tail+0xa4/0x18c [drm_kms_helper]
>>>>   drm_atomic_helper_commit+0x19c/0x1b0 [drm_kms_helper]
>>>>   drm_atomic_commit+0xa4/0x104 [drm]
>>>>   drm_client_modeset_commit_atomic+0x22c/0x298 [drm]
>>>>   drm_client_modeset_commit_locked+0x60/0x1c0 [drm]
>>>>   drm_client_modeset_commit+0x30/0x58 [drm]
>>>>   __drm_fb_helper_restore_fbdev_mode_unlocked+0xbc/0xfc
>>>> [drm_kms_helper]
>>>>   drm_fb_helper_set_par+0x30/0x4c [drm_kms_helper]
>>>>   fbcon_init+0x224/0x49c
>>>>   visual_init+0xb0/0x108
>>>>   do_bind_con_driver.isra.0+0x19c/0x38c
>>>>   do_take_over_console+0x140/0x1ec
>>>>   do_fbcon_takeover+0x6c/0xe4
>>>>   fbcon_fb_registered+0x180/0x1f0
>>>>   register_framebuffer+0x19c/0x228
>>>>   __drm_fb_helper_initial_config_and_unlock+0x2e8/0x4e8
>>>> [drm_kms_helper]
>>>>   drm_fb_helper_initial_config+0x3c/0x4c [drm_kms_helper]
>>>>   msm_fbdev_client_hotplug+0x84/0xcc [msm]
>>>>   drm_client_register+0x5c/0xa0 [drm]
>>>>   msm_fbdev_setup+0x94/0x148 [msm]
>>>>   msm_drm_bind+0x3d0/0x42c [msm]
>>>>   try_to_bring_up_aggregate_device+0x1ec/0x2f4
>>>>   __component_add+0xa8/0x194
>>>>   component_add+0x14/0x20
>>>>   dp_display_probe+0x278/0x41c [msm]
>>>>
>>>>> [1] https://patchwork.freedesktop.org/patch/30/
>>>>>
>>>>> Fixes: bfcc3d8f94f4 ("drm/msm/dp: support setting the DP subconnector
>>>>> type")
>>>>> Reviewed-by: Abhinav Kumar 
>>>>> Signed-off-by: Dmitry Baryshkov 
>>>>
>>>> Reviewed-by: Johan Hovold 
>>>> Tested-by: Johan Hovold 
>>>>
>>>
>>> Thanks !
>>>
>>>> Johan
> 
> 


Re: [PATCH v2 2/2] drm/msm/dp: attach the DP subconnector property

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 15.11.23 19:06, Abhinav Kumar wrote:
> On 11/15/2023 12:06 AM, Johan Hovold wrote:
>> On Wed, Oct 25, 2023 at 12:23:10PM +0300, Dmitry Baryshkov wrote:
>>> While developing and testing the commit bfcc3d8f94f4 ("drm/msm/dp:
>>> support setting the DP subconnector type") I had the patch [1] in my
>>> tree. I haven't noticed that it was a dependency for the commit in
>>> question. Mea culpa.
>>
>> This also broke boot on the Lenovo ThinkPad X13s.
>>
>> Would be nice to get this fixed ASAP so that further people don't have
>> to debug this known regression.
> 
> I will queue this patch for -fixes rightaway.

Thx. I noticed that this fix is still not in -next. I then investigated
and I found it was applied on Thursday last week here:
https://gitlab.freedesktop.org/drm/msm/-/commits/msm-fixes?ref_type=heads

Makes me wonder: when will that patch go to a branch that is included in
-next? And when will it move on towards mainline?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>>> Since the patch has not landed yet (and even was not reviewed)
>>> and since one of the bridges erroneously uses USB connector type instead
>>> of DP, attach the property directly from the MSM DP driver.
>>>
>>> This fixes the following oops on DP HPD event:
>>>
>>>   drm_object_property_set_value (drivers/gpu/drm/drm_mode_object.c:288)
>>>   dp_display_process_hpd_high (drivers/gpu/drm/msm/dp/dp_display.c:402)
>>>   dp_hpd_plug_handle.isra.0 (drivers/gpu/drm/msm/dp/dp_display.c:604)
>>>   hpd_event_thread (drivers/gpu/drm/msm/dp/dp_display.c:1110)
>>>   kthread (kernel/kthread.c:388)
>>>   ret_from_fork (arch/arm64/kernel/entry.S:858)
>>
>> This only says where the oops happened, it doesn't necessarily in itself
>> indicate an oops at all or that in this case it's a NULL pointer
>> dereference.
>>
>> On the X13s I'm seeing the NULL deref in a different path during boot,
>> and when this happens after a deferred probe (due to the panel lookup
>> mess) it hangs the machine, which makes it a bit of a pain to debug:
>>
>>     Unable to handle kernel NULL pointer dereference at virtual
>> address 0060
>>     ...
>>     CPU: 4 PID: 57 Comm: kworker/u16:1 Not tainted 6.7.0-rc1 #4
>>     Hardware name: Qualcomm QRD, BIOS
>> 6.0.220110.BOOT.MXF.1.1-00470-MAKENA-1 01/10/2022
>>     ...
>>     Call trace:
>>  drm_object_property_set_value+0x0/0x88 [drm]
>>  dp_display_process_hpd_high+0xa0/0x14c [msm]
>>  dp_hpd_plug_handle.constprop.0.isra.0+0x90/0x110 [msm]
>>  dp_bridge_atomic_enable+0x184/0x21c [msm]
>>  edp_bridge_atomic_enable+0x60/0x94 [msm]
>>  drm_atomic_bridge_chain_enable+0x54/0xc8 [drm]
>>  drm_atomic_helper_commit_modeset_enables+0x194/0x26c
>> [drm_kms_helper]
>>  msm_atomic_commit_tail+0x204/0x804 [msm]
>>  commit_tail+0xa4/0x18c [drm_kms_helper]
>>  drm_atomic_helper_commit+0x19c/0x1b0 [drm_kms_helper]
>>  drm_atomic_commit+0xa4/0x104 [drm]
>>  drm_client_modeset_commit_atomic+0x22c/0x298 [drm]
>>  drm_client_modeset_commit_locked+0x60/0x1c0 [drm]
>>  drm_client_modeset_commit+0x30/0x58 [drm]
>>  __drm_fb_helper_restore_fbdev_mode_unlocked+0xbc/0xfc
>> [drm_kms_helper]
>>  drm_fb_helper_set_par+0x30/0x4c [drm_kms_helper]
>>  fbcon_init+0x224/0x49c
>>  visual_init+0xb0/0x108
>>  do_bind_con_driver.isra.0+0x19c/0x38c
>>  do_take_over_console+0x140/0x1ec
>>  do_fbcon_takeover+0x6c/0xe4
>>  fbcon_fb_registered+0x180/0x1f0
>>  register_framebuffer+0x19c/0x228
>>  __drm_fb_helper_initial_config_and_unlock+0x2e8/0x4e8
>> [drm_kms_helper]
>>  drm_fb_helper_initial_config+0x3c/0x4c [drm_kms_helper]
>>  msm_fbdev_client_hotplug+0x84/0xcc [msm]
>>  drm_client_register+0x5c/0xa0 [drm]
>>  msm_fbdev_setup+0x94/0x148 [msm]
>>  msm_drm_bind+0x3d0/0x42c [msm]
>>  try_to_bring_up_aggregate_device+0x1ec/0x2f4
>>  __component_add+0xa8/0x194
>>  component_add+0x14/0x20
>>  dp_display_probe+0x278/0x41c [msm]
>>
>>> [1] https://patchwork.freedesktop.org/patch/30/
>>>
>>> Fixes: bfcc3d8f94f4 ("drm/msm/dp: support setting the DP subconnector
>>> type")
>>> Reviewed-by: Abhinav Kumar 
>>> Signed-off-by: Dmitry Baryshkov 
>>
>> Reviewed-by: Johan Hovold 
>> Tested-by: Johan Hovold 
>>
> 
> Thanks !
> 
>> Johan


Re: [REGRESSION]: nouveau: Asynchronous wait on fence

2023-11-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 15.11.23 07:19, Owen T. Heisler wrote:
> On 10/31/23 04:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 28.10.23 04:46, Owen T. Heisler wrote:
>>> #regzbot introduced: d386a4b54607cf6f76e23815c2c9a3abc1d66882
>>> #regzbot link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/180
>>>
>>> ## Problem
>>>
>>> 1. Connect external display to DVI port on dock and run X with both
>>>     displays in use.
>>> 2. Wait hours or days.
>>> 3. Suddenly the secondary Nvidia-connected display turns off and X stops
>>>     responding to keyboard/mouse input. In *some* cases it is
>>> possible to
>>>     switch to a virtual TTY with Ctrl+Alt+Fn and log in there.
> 
>> You thus might want to check if the problem occurs with 6.6 -- and
>> ideally also check if reverting the culprit there fixes things for you.
> 
> The problem also occurs with v6.6.

You meanwhile might want to give 6.7-rc as well on the off chance that
it improves things, even if that is unlikely.

> Here is a decoded kernel log from an
> untainted kernel:
> 
> https://gitlab.freedesktop.org/drm/nouveau/uploads/c120faf09da46f9c74006df9f1d14442/async-wait-on-fence-180.log
> 
> The culprit commit does not revert cleanly on v6.6. I have not yet
> attempted to resolve the conflicts.
> 
> I have also updated the bug description at
> <https://gitlab.freedesktop.org/drm/nouveau/-/issues/180>.

Maybe one of the nouveau developer can take a quick look at
d386a4b54607cf and suggest a simple way to revert it in latest mainline.
Maybe just removing the main chunk of code that is added is all that it
takes.

Ciao, Thorsten


Re: Radeon regression in 6.6 kernel

2023-11-19 Thread Linux regression tracking (Thorsten Leemhuis)
On 19.11.23 14:24, Bagas Sanjaya wrote:
> On Sun, Nov 19, 2023 at 04:47:01PM +1000, Dave Airlie wrote:
>>> On 12.11.23 01:46, Phillip Susi wrote:
 I had been testing some things on a post 6.6-rc5 kernel for a week or
 two and then when I pulled to a post 6.6 release kernel, I found that
 system suspend was broken.  It seems that the radeon driver failed to
 suspend, leaving the display dead, the wayland display server hung, and
 the system still running.  I have been trying to bisect it for the last
 few days and have only been able to narrow it down to the following 3
 commits:

 There are only 'skip'ped commits left to test.
 The first bad commit could be any of:
 56e449603f0ac580700621a356d35d5716a62ce5
 c07bf1636f0005f9eb7956404490672286ea59d3
 b70438004a14f4d0f9890b3297cd66248728546c
 We cannot bisect more!
>>>
>>> Hmm, not a single reply from the amdgpu folks. Wondering how we can
>>> encourage them to look into this.
>>>
>>> Phillip, reporting issues by mail should still work, but you might have
>>> more luck here, as that's where the amdgpu afaics prefer to track bugs:
>>> https://gitlab.freedesktop.org/drm/amd/-/issues
>>>
>>> When you file an issue there, please mention it here.
>>>
>>> Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
>>> comes out later today) or 6.6.2-rc1 improve things.

BTW, ignore the "6.6.2-rc1" here, I misunderstood one detail earlier. Sorry.

>> It would also be good to test if reverting any of these is possible or not.

Good point, sorry, forgot to mention that.

> Hi Dave,
> 
> AFAIK commit c07bf1636f0005 ("MAINTAINERS: Update the GPU Scheduler email")
> doesn't seem to do with this regression as it doesn't change any amdgpu code
> that may introduce the regression.

Bagas, sorry for being blunt here, I know you mean well. But I feel the
need to say the following in the open, as this otherwise falls back on
me and regression tracking.

Stating the above is not very helpful, as Dave for sure will know.
Telling Phillip that he likely can skip that commit might have been
something different. But I guess even for most users that are able to do
a bisection it's obvious and maybe not worth pointing out.

Ciao, Thorsten


Re: Radeon regression in 6.6 kernel

2023-11-18 Thread Linux regression tracking (Thorsten Leemhuis)
Lo!

On 12.11.23 01:46, Phillip Susi wrote:
> I had been testing some things on a post 6.6-rc5 kernel for a week or
> two and then when I pulled to a post 6.6 release kernel, I found that
> system suspend was broken.  It seems that the radeon driver failed to
> suspend, leaving the display dead, the wayland display server hung, and
> the system still running.  I have been trying to bisect it for the last
> few days and have only been able to narrow it down to the following 3
> commits:
> 
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 56e449603f0ac580700621a356d35d5716a62ce5
> c07bf1636f0005f9eb7956404490672286ea59d3
> b70438004a14f4d0f9890b3297cd66248728546c
> We cannot bisect more!

Hmm, not a single reply from the amdgpu folks. Wondering how we can
encourage them to look into this.

Phillip, reporting issues by mail should still work, but you might have
more luck here, as that's where the amdgpu afaics prefer to track bugs:
https://gitlab.freedesktop.org/drm/amd/-/issues

When you file an issue there, please mention it here.

Furthermore it might help if you could verify if 6.7-rc1 (or rc2, which
comes out later today) or 6.6.2-rc1 improve things.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke


> It appears that there was a late merge in the 6.6 window that originally
> forked from the -rc2, as many of the later commits that I bisected had
> that version number.
> 
> I couldn't get it more narrowed down because I had to skip the
> surrounding commits because they wouldn't even boot up to a gui desktop,
> let alone try to suspend.
> 
> When system suspend fails, I find the following in my syslog after I
> have to magic-sysrq reboot because the the display is dead:
> 
> Nov 11 18:44:39 faldara kernel: PM: suspend entry (deep)
> Nov 11 18:44:39 faldara kernel: Filesystems sync: 0.035 seconds
> Nov 11 18:44:40 faldara kernel: Freezing user space processes
> Nov 11 18:44:40 faldara kernel: Freezing user space processes completed 
> (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: OOM killer disabled.
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks
> Nov 11 18:44:40 faldara kernel: Freezing remaining freezable tasks completed 
> (elapsed 0.001 seconds)
> Nov 11 18:44:40 faldara kernel: printk: Suspending console(s) (use 
> no_console_suspend to debug)
> Nov 11 18:44:40 faldara kernel: serial 00:01: disabled
> Nov 11 18:44:40 faldara kernel: e1000e: EEE TX LPI TIMER: 0011
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Synchronizing SCSI cache
> Nov 11 18:44:40 faldara kernel: sd 4:0:0:0: [sdb] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 1:0:0:0: [sda] Stopping disk
> Nov 11 18:44:40 faldara kernel: sd 5:0:0:0: [sdc] Stopping disk
> Nov 11 18:44:40 faldara kernel: amdgpu: Move buffer fallback to memcpy 
> unavailable
> Nov 11 18:44:40 faldara kernel: [TTM] Buffer eviction failed
> Nov 11 18:44:40 faldara kernel: [drm] evicting device resources failed
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: pci_pm_suspend(): 
> amdgpu_pmops_suspend+0x0/0x80 [amdgpu] returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: dpm_run_callback(): 
> pci_pm_suspend+0x0/0x170 returns -19
> Nov 11 18:44:40 faldara kernel: amdgpu :03:00.0: PM: failed to suspend 
> async: error -19
> Nov 11 18:44:40 faldara kernel: PM: Some devices failed to suspend, or early 
> wake event detected
> Nov 11 18:44:40 faldara kernel: xhci_hcd :06:00.0: xHC error in resume, 
> USBSTS 0x401, Reinit
> Nov 11 18:44:40 faldara kernel: usb usb3: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: usb usb4: root hub lost power or was reset
> Nov 11 18:44:40 faldara kernel: serial 00:01: activated
> Nov 11 18:44:40 faldara kernel: nvme nvme0: 4/0/0 default/read/poll queues
> Nov 11 18:44:40 faldara kernel: ata8: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata7: SATA link down (SStatus 0 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 
> SControl 300)
> Nov 11 18:44:40 faldara kernel: ata1: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata3: SATA link down (SStatus 4 SControl 300)
> Nov 11 18:44:40 faldara kernel: ata4.00: configured for UDMA/133
> Nov 11 18:44:40 faldara kernel: OOM killer enabled.
> Nov 11 18:44:40 faldara kernel: Restarting tasks ... done.
> Nov 11 18:44:40 faldara kernel: random: crng reseeded on system resumption
> Nov 11 18:44:40 faldara kernel: PM: suspend exit
> Nov 11 18:44:40 faldara kernel: PM: suspend entry (s2idle)

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-10-25 Thread Linux regression tracking (Thorsten Leemhuis)
On 25.10.23 15:23, Huacai Chen wrote:
> On Wed, Oct 25, 2023 at 6:08 PM Thorsten Leemhuis
>  wrote:
>>
>> Javier, Dave, Sima,
>>
>> On 23.10.23 00:54, Evan Preston wrote:
>>> On 2023-10-20 Fri 05:48pm, Huacai Chen wrote:
>>>> On Fri, Oct 20, 2023 at 5:35 PM Linux regression tracking (Thorsten
>>>> Leemhuis)  wrote:
>>>>> On 09.10.23 10:54, Huacai Chen wrote:
>>>>>> On Mon, Oct 9, 2023 at 4:45 PM Bagas Sanjaya  
>>>>>> wrote:
>>>>>>> On Mon, Oct 09, 2023 at 09:27:02AM +0800, Huacai Chen wrote:
>>>>>>>> On Tue, Sep 26, 2023 at 10:31 PM Huacai Chen  
>>>>>>>> wrote:
>>>>>>>>> On Tue, Sep 26, 2023 at 7:15 PM Linux regression tracking (Thorsten
>>>>>>>>> Leemhuis)  wrote:
>>>>>>>>>> On 13.09.23 14:02, Jaak Ristioja wrote:
>>>>>>>>>>>
>>>>>>>>>>> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel 
>>>>>>>>>>> HD
>>>>>>>>>>> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a 
>>>>>>>>>>> blank
>>>>>>>>>>> screen after boot until the display manager starts... if it does 
>>>>>>>>>>> start
>>>>>>>>>>> at all. Using the nomodeset kernel parameter seems to be a 
>>>>>>>>>>> workaround.
>>>>>>>>>>>
>>>>>>>>>>> I've bisected this to commit 
>>>>>>>>>>> 60aebc9559492cea6a9625f514a8041717e3a2e4
>>>>>>>>>>> ("drivers/firmware: Move sysfb_init() from device_initcall to
>>>>>>>>>>> subsys_initcall_sync").
>>>>>>>>>>
>>>>>>>> As confirmed by Jaak, disabling DRM_SIMPLEDRM makes things work fine
>>>>>>>> again. So I guess the reason:
>>>>>
>>>>> Well, this to me still looks a lot (please correct me if I'm wrong) like
>>>>> regression that should be fixed, as DRM_SIMPLEDRM was enabled beforehand
>>>>> if I understood things correctly. Or is there a proper fix for this
>>>>> already in the works and I just missed this? Or is there some good
>>>>> reason why this won't/can't be fixed?
>>>>
>>>> DRM_SIMPLEDRM was enabled but it didn't work at all because there was
>>>> no corresponding platform device. Now DRM_SIMPLEDRM works but it has a
>>>> blank screen. Of course it is valuable to investigate further about
>>>> DRM_SIMPLEDRM on Jaak's machine, but that needs Jaak's effort because
>>>> I don't have a same machine.
>>
>> Side note: Huacai, have you tried working with Jaak to get down to the
>> real problem? Evan, might you be able to help out here?
> No, Jaak has no response after he 'fixed' his problem by disabling SIMPLEDRM.

Yeah, understood, already suspected something like that, thx for confirming.

>> But I write this mail for a different reason:
>>
>>> I am having the same issue on a Lenovo Thinkpad P70 (Intel
>>> Corporation HD Graphics 530 (rev 06), Intel(R) Core(TM) i7-6700HQ).
>>> Upgrading from Linux 6.4.12 to 6.5 and later results in only a blank
>>> screen after boot and a rapidly flashing device-access-status
>>> indicator.
>>
>> This additional report makes me wonder if we should revert the culprit
>> (60aebc9559492c ("drivers/firmware: Move sysfb_init() from
>> device_initcall to subsys_initcall_sync") [v6.5-rc1]). But I guess that
>> might lead to regressions for some users? But the patch description says
>> that this is not a common configuration, so can we maybe get away with that?
>>From my point of view, this is not a regression, 60aebc9559492c
> doesn't cause a problem, but exposes a problem.

>From my understanding of Linus stance in cases like this I think that
aspect doesn't matter. To for example quote
https://lore.kernel.org/lkml/CAHk-=wiP4K8DRJWsCo=20hn_6054xbamgkf2kpguzpb5ama...@mail.gmail.com/

""
But it ended up exposing another problem, and as such caused a kernel
upgrade to fail for a user. So it got reverted.
"""

For other examples of his view see the bottom half of
https://docs.kernel.org/process/handling-regressions.html

We could bring Linus in to clarify if needed, but I for now didn't CC
him, as I hope we can solve this without h

Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-10-20 Thread Linux regression tracking (Thorsten Leemhuis)
On 09.10.23 10:54, Huacai Chen wrote:
> On Mon, Oct 9, 2023 at 4:45 PM Bagas Sanjaya  wrote:
>> On Mon, Oct 09, 2023 at 09:27:02AM +0800, Huacai Chen wrote:
>>> On Tue, Sep 26, 2023 at 10:31 PM Huacai Chen  wrote:
>>>> On Tue, Sep 26, 2023 at 7:15 PM Linux regression tracking (Thorsten
>>>> Leemhuis)  wrote:
>>>>> On 13.09.23 14:02, Jaak Ristioja wrote:
>>>>>>
>>>>>> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel HD
>>>>>> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a blank
>>>>>> screen after boot until the display manager starts... if it does start
>>>>>> at all. Using the nomodeset kernel parameter seems to be a workaround.
>>>>>>
>>>>>> I've bisected this to commit 60aebc9559492cea6a9625f514a8041717e3a2e4
>>>>>> ("drivers/firmware: Move sysfb_init() from device_initcall to
>>>>>> subsys_initcall_sync").
>>>>>
>>>>> Hmmm, no reaction since it was posted a while ago, unless I'm missing
>>>>> something.
>>>>>
>>>>> Huacai Chen, did you maybe miss this report? The problem is apparently
>>>>> caused by a commit of yours (that Javier applied), you hence should look
>>>>> into this.
>>>> I'm sorry but it looks very strange, could you please share your config 
>>>> file?
>>> As confirmed by Jaak, disabling DRM_SIMPLEDRM makes things work fine
>>> again. So I guess the reason:
>>
>> Did Jaak reply privately? It should have been disclosed in public
>> ML here instead.
> Yes, he replied privately, and disabling DRM_SIMPLEDRM was suggested by me.

Well, this to me still looks a lot (please correct me if I'm wrong) like
regression that should be fixed, as DRM_SIMPLEDRM was enabled beforehand
if I understood things correctly. Or is there a proper fix for this
already in the works and I just missed this? Or is there some good
reason why this won't/can't be fixed?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

>>> When SIMPLEDRM takes over the framebuffer, the screen is blank (don't
>>> know why). And before 60aebc9559492cea6a9625f ("drivers/firmware: Move
>>> sysfb_init() from device_initcall to subsys_initcall_sync") there is
>>> no platform device created for SIMPLEDRM at early stage, so it seems
>>> also "no problem".
>>
>> I don't understand above. You mean that after that commit the platform
>> device is also none, right?
> No. The SIMPLEDRM driver needs a platform device to work, and that
> commit makes the platform device created earlier. So, before that
> commit, SIMPLEDRM doesn't work, but the screen isn't blank; after that
> commit, SIMPLEDRM works, but the screen is blank.
> 
> Huacai
>>
>> Confused...
>>
>> --
>> An old man doll... just what I always wanted! - Clara
> 
> 


Re: Blank screen on boot of Linux 6.5 and later on Lenovo ThinkPad L570

2023-09-26 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 13.09.23 14:02, Jaak Ristioja wrote:
> 
> Upgrading to Linux 6.5 on a Lenovo ThinkPad L570 (Integrated Intel HD
> Graphics 620 (rev 02), Intel(R) Core(TM) i7-7500U) results in a blank
> screen after boot until the display manager starts... if it does start
> at all. Using the nomodeset kernel parameter seems to be a workaround.
> 
> I've bisected this to commit 60aebc9559492cea6a9625f514a8041717e3a2e4
> ("drivers/firmware: Move sysfb_init() from device_initcall to
> subsys_initcall_sync").

Hmmm, no reaction since it was posted a while ago, unless I'm missing
something.

Huacai Chen, did you maybe miss this report? The problem is apparently
caused by a commit of yours (that Javier applied), you hence should look
into this.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> git bisect start
> # status: waiting for both good and bad commits
> # good: [6995e2de6891c724bfeb2db33d7b87775f913ad1] Linux 6.4
> git bisect good 6995e2de6891c724bfeb2db33d7b87775f913ad1
> # status: waiting for bad commit, 1 good commit known
> # bad: [2dde18cd1d8fac735875f2e4987f11817cc0bc2c] Linux 6.5
> git bisect bad 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
> # bad: [b775d6c5859affe00527cbe74263de05cfe6b9f9] Merge tag 'mips_6.5'
> of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
> git bisect bad b775d6c5859affe00527cbe74263de05cfe6b9f9
> # good: [3a8a670eeeaa40d87bd38a587438952741980c18] Merge tag
> 'net-next-6.5' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect good 3a8a670eeeaa40d87bd38a587438952741980c18
> # bad: [188d3f80fc6d8451ab5e570becd6a7b2d3033023] drm/amdgpu: vcn_4_0
> set instance 0 init sched score to 1
> git bisect bad 188d3f80fc6d8451ab5e570becd6a7b2d3033023
> # good: [12fb1ad70d65edc3405884792d044fa79df7244f] drm/amdkfd: update
> process interrupt handling for debug events
> git bisect good 12fb1ad70d65edc3405884792d044fa79df7244f
> # bad: [9cc31938d4586f72eb8e0235ad9d9eb22496fcee] i915/perf: Drop the
> aging_tail logic in perf OA
> git bisect bad 9cc31938d4586f72eb8e0235ad9d9eb22496fcee
> # bad: [51d86ee5e07ccef85af04ee9850b0baa107999b6] drm/msm: Switch to
> fdinfo helper
> git bisect bad 51d86ee5e07ccef85af04ee9850b0baa107999b6
> # good: [bfdede3a58ea970333d77a05144a7bcec13cf515] drm/rockchip: cdn-dp:
> call drm_connector_update_edid_property() unconditionally
> git bisect good bfdede3a58ea970333d77a05144a7bcec13cf515
> # good: [123ee07ba5b7123e0ce0e0f9d64938026c16a2ce] drm: sun4i_tcon: use
> devm_clk_get_enabled in `sun4i_tcon_init_clocks`
> git bisect good 123ee07ba5b7123e0ce0e0f9d64938026c16a2ce
> # bad: [20d54e48d9c705091a025afff5839da2ea606f6b] fbdev: Rename
> fb_mem*() helpers
> git bisect bad 20d54e48d9c705091a025afff5839da2ea606f6b
> # bad: [728cb3f061e2b3a002fd76d91c2449b1497b6640] gpu: drm: bridge: No
> need to set device_driver owner
> git bisect bad 728cb3f061e2b3a002fd76d91c2449b1497b6640
> # bad: [0f1cb4d777281ca3360dbc8959befc488e0c327e] drm/ssd130x: Fix
> include guard name
> git bisect bad 0f1cb4d777281ca3360dbc8959befc488e0c327e
> # good: [0bd5bd65cd2e4d1335ea6c17cd2c8664decbc630] dt-bindings: display:
> simple: Add BOE EV121WXM-N10-1850 panel
> git bisect good 0bd5bd65cd2e4d1335ea6c17cd2c8664decbc630
> # bad: [60aebc9559492cea6a9625f514a8041717e3a2e4] drivers/firmware: Move
> sysfb_init() from device_initcall to subsys_initcall_sync
> git bisect bad 60aebc9559492cea6a9625f514a8041717e3a2e4
> # good: [8bb7c7bca5b70f3cd22d95b4d36029295c4274f6] drm/panel:
> panel-simple: Add BOE EV121WXM-N10-1850 panel support
> git bisect good 8bb7c7bca5b70f3cd22d95b4d36029295c4274f6
> # first bad commit: [60aebc9559492cea6a9625f514a8041717e3a2e4]
> drivers/firmware: Move sysfb_init() from device_initcall to
> subsys_initcall_sync


Re: [REGRESSION] HDMI connector detection broken in 6.3 on Intel(R) Celeron(R) N3060 integrated graphics

2023-08-13 Thread Linux regression tracking (Thorsten Leemhuis)
On 11.08.23 20:10, Mikhail Rudenko wrote:
> On 2023-08-11 at 08:45 +02, Thorsten Leemhuis  
> wrote:
>> On 10.08.23 21:33, Mikhail Rudenko wrote:
>>> The following is a copy an issue I posted to drm/i915 gitlab [1] two
>>> months ago. I repost it to the mailing lists in hope that it will help
>>> the right people pay attention to it.
>>
>> Thx for your report. Wonder why Dmitry (who authored a4e771729a51) or
>> Thomas (who committed it) it didn't look into this, but maybe the i915
>> devs didn't forward the report to them.

For the record: they did, and Jani mentioned already. Sorry, should have
phrased this differently.

>> Let's see if these mails help. Just wondering: does reverting
>> a4e771729a51 from 6.5-rc5 or drm-tip help as well?
> 
> I've redone my tests with 6.5-rc5, and here are the results:
> (1) 6.5-rc5 -> still affected
> (2) 6.5-rc5 + revert a4e771729a51 -> not affected
> (3) 6.5-rc5 + two patches [1][2] suggested on i915 gitlab by @ideak -> not 
> affected (!)
> 
> Should we somehow tell regzbot about (3)?

That's good to know, thx. But the more important things are:

* When will those be merged? They are not yet in next yet afaics, so it
might take some time to mainline them, especially at this point of the
devel cycle. Imre, could you try to prod the right people so that these
are ideally upstreamed rather sooner than later, as they fix a regression?
* They if possible ideally should be tagged for backporting to 6.4, as
this is a regression from the 6.3 cycle.

But yes, let's tell regzbot that fixes are available, too:

#regzbot fix: drm/i915: Fix HPD polling, reenabling the output poll work
as needed

(for the record: that's the second of two patches apparently needed)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> BTW, there was an earlier report about a problem with a4e771729a51 that
>> afaics was never addressed, but it might be unrelated.
>> https://lore.kernel.org/all/20230328023129.3596968-1-zhouzong...@kylinos.cn/
> [1] https://patchwork.freedesktop.org/patch/548590/?series=121050=1
> [2] https://patchwork.freedesktop.org/patch/548591/?series=121050=1



Re: [PATCH 2/2] drm/bridge: lt9611: Do not generate HFP/HBP/HSA and EOT packet

2023-07-26 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

What's the status wrt to this regression (caused by 8ddce13ae69 from
Marek)? It looks like things are stalled and the regression still is
unresolved, but I ask because I might be missing something.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.07.23 08:11, Amit Pundir wrote:
> On Thu, 13 Jul 2023 at 23:58, Marek Vasut  wrote:
>>
>> On 7/13/23 20:09, Abhinav Kumar wrote:
>>>
>>>
>>> On 7/12/2023 10:41 AM, Marek Vasut wrote:
 On 7/9/23 03:03, Abhinav Kumar wrote:
>
>
> On 7/7/2023 1:47 AM, Neil Armstrong wrote:
>> On 07/07/2023 09:18, Neil Armstrong wrote:
>>> Hi,
>>>
>>> On 06/07/2023 11:20, Amit Pundir wrote:
 On Wed, 5 Jul 2023 at 11:09, Dmitry Baryshkov
  wrote:
>
> [Adding freedreno@ to cc list]
>
> On Wed, 5 Jul 2023 at 08:31, Jagan Teki
>  wrote:
>>
>> Hi Amit,
>>
>> On Wed, Jul 5, 2023 at 10:15 AM Amit Pundir
>>  wrote:
>>>
>>> Hi Marek,
>>>
>>> On Wed, 5 Jul 2023 at 01:48, Marek Vasut  wrote:

 Do not generate the HS front and back porch gaps, the HSA gap and
 EOT packet, as these packets are not required. This makes the
 bridge
 work with Samsung DSIM on i.MX8MM and i.MX8MP.
>>>
>>> This patch broke display on Dragonboard 845c (SDM845) devboard
>>> running
>>> AOSP. This is what I see
>>> https://people.linaro.org/~amit.pundir/db845c-userdebug/v6.5-broken-display/PXL_20230704_150156326.jpg.
>>> Reverting this patch fixes this regression for me.
>>
>> Might be msm dsi host require proper handling on these updated
>> mode_flags? did they?
>
> The msm DSI host supports those flags. Also, I'd like to point out
> that the patch didn't change the rest of the driver code. So even if
> drm/msm ignored some of the flags, it should not have caused the
> issue. Most likely the issue is on the lt9611 side. I's suspect that
> additional programming is required to make it work with these flags.

 I spent some time today on smoke testing these flags (individually
 and
 in limited combination) on DB845c, to narrow down this breakage to
 one
 or more flag(s) triggering it. Here are my observations in limited
 testing done so far.

 There is no regression with MIPI_DSI_MODE_NO_EOT_PACKET when enabled
 alone and system boots to UI as usual.

 MIPI_DSI_MODE_VIDEO_NO_HFP always trigger the broken display as in
 the
 screenshot[1] shared earlier as well.

 Adding either of MIPI_DSI_MODE_VIDEO_NO_HSA and
 MIPI_DSI_MODE_VIDEO_NO_HBP always result in no display, unless paired
 with MIPI_DSI_MODE_VIDEO_NO_HFP and in that case we get the broken
 display as reported.

 In short other than MIPI_DSI_MODE_NO_EOT_PACKET flag, all other flags
 added in this commit break the display on DB845c one way or another.
>>>
>>> I think the investigation would be to understand why samsung-dsim
>>> requires
>>> such flags and/or what are the difference in behavior between MSM
>>> DSI and samsung DSIM
>>> for those flags ?
>>>
>>> If someone has access to the lt9611 datasheet, so it requires
>>> HSA/HFP/HBP to be
>>> skipped ? and does MSM DSI and samsung DSIM skip them in the same
>>> way ?
>>
>> I think there's a mismatch, where on one side this flags sets the
>> link in LP-11 while
>> in HSA/HFP/HPB while on the other it completely removes those
>> blanking packets.
>>
>> The name MIPI_DSI_MODE_VIDEO_NO_HBP suggests removal of HPB, not
>> LP-11 while HPB.
>> the registers used in both controllers are different:
>> - samsung-dsim: DSIM_HBP_DISABLE_MODE
>> - msm dsi: DSI_VID_CFG0_HBP_POWER_STOP
>>
>> The first one suggest removing the packet, while the second one
>> suggests powering
>> off the line while in the blanking packet period.
>>
>> @Abhinav, can you comment on that ?
>>
>
> I dont get what it means by completely removes blanking packets in DSIM.

 MIPI_DSI_MODE_VIDEO_NO_HFP means the HBP period is just skipped by DSIM.

 Maybe there is a need for new set of flags which differentiate between
 HBP skipped (i.e. NO HBP) and HBP LP11 ?

>>>
>>> No, the section of the MIPI DSI spec I posted below clearly states there
>>> are 

Re: [PATCH v2] drm/ast: report connection status on Display Port.

2023-07-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.07.23 10:12, Jocelyn Falempe wrote:
> On 06/07/2023 15:03, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 06.07.23 11:58, Jocelyn Falempe wrote:
>>> Aspeed always report the display port as "connected", because it
>>> doesn't set a .detect callback.
>>> Fix this by providing the proper detect callback for astdp and dp501.
>>>
>>> This also fixes the following regression:
>>> Since commit fae7d186403e ("drm/probe-helper: Default to 640x480 if no
>>>   EDID on DP")
>>> The default resolution is now 640x480 when no monitor is connected.
>>> But Aspeed graphics is mostly used in servers, where no monitor
>>> is attached. This also affects the remote BMC resolution to 640x480,
>>> which is inconvenient, and breaks the anaconda installer.
>>>
>>> v2: Add .detect callback to the dp/dp501 connector (Jani Nikula)
>>>
>>> Signed-off-by: Jocelyn Falempe 
>>
>> So if this "also fixes a regression" how about a Fixes: tag and a CC:
>> > also in all affected stable and longterm kernels?
> 
> In this case, the regression only affect one userspace program
> (anaconda),

That is (mostly) irrelevant when it comes to regressions.

> and the fix looks too risky to backport to all stable kernels.

Not sure, but I tend to thing that decision would better be left to the
stable team. Each developer will have a different opinion about what's
too risky or not and they might be in the better position to judge what
they want for their trees. A "Fixes:" tag thus still seems appropriate
here; will also tell downstream distros that might want to pick this up.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [PATCH 2/2] drm/bridge: lt9611: Do not generate HFP/HBP/HSA and EOT packet

2023-07-08 Thread Linux regression tracking (Thorsten Leemhuis)
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 05.07.23 06:45, Amit Pundir wrote:
> 
> On Wed, 5 Jul 2023 at 01:48, Marek Vasut  wrote:
>>
>> Do not generate the HS front and back porch gaps, the HSA gap and
>> EOT packet, as these packets are not required. This makes the bridge
>> work with Samsung DSIM on i.MX8MM and i.MX8MP.
> 
> This patch broke display on Dragonboard 845c (SDM845) devboard running
> AOSP. This is what I see
> https://people.linaro.org/~amit.pundir/db845c-userdebug/v6.5-broken-display/PXL_20230704_150156326.jpg.
> Reverting this patch fixes this regression for me.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 8ddce13ae69
#regzbot title drm/bridge: lt9611: Dragonboard 845c (SDM845) devboard
broken when running AOSP
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [PATCH v2] drm/ast: report connection status on Display Port.

2023-07-06 Thread Linux regression tracking (Thorsten Leemhuis)
On 06.07.23 11:58, Jocelyn Falempe wrote:
> Aspeed always report the display port as "connected", because it
> doesn't set a .detect callback.
> Fix this by providing the proper detect callback for astdp and dp501.
> 
> This also fixes the following regression:
> Since commit fae7d186403e ("drm/probe-helper: Default to 640x480 if no
>  EDID on DP")
> The default resolution is now 640x480 when no monitor is connected.
> But Aspeed graphics is mostly used in servers, where no monitor
> is attached. This also affects the remote BMC resolution to 640x480,
> which is inconvenient, and breaks the anaconda installer.
> 
> v2: Add .detect callback to the dp/dp501 connector (Jani Nikula)
> 
> Signed-off-by: Jocelyn Falempe 

So if this "also fixes a regression" how about a Fixes: tag and a CC:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [PATCH 1/2] fbdev/offb: Update expected device name

2023-06-15 Thread Linux regression tracking (Thorsten Leemhuis)
On 16.04.23 14:34, Salvatore Bonaccorso wrote:
> 
> On Wed, Apr 12, 2023 at 11:55:08AM +0200, Cyril Brulebois wrote:
>> Since commit 241d2fb56a18 ("of: Make OF framebuffer device names unique"),
>> as spotted by Frédéric Bonnard, the historical "of-display" device is
>> gone: the updated logic creates "of-display.0" instead, then as many
>> "of-display.N" as required.
>>
>> This means that offb no longer finds the expected device, which prevents
>> the Debian Installer from setting up its interface, at least on ppc64el.
>>
>> It might be better to iterate on all possible nodes, but updating the
>> hardcoded device from "of-display" to "of-display.0" is confirmed to fix
>> the Debian Installer at the very least.
> [...]
> #regzbot ^introduced 241d2fb56a18
> #regzbot title: Open Firmware framebuffer cannot find of-display
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=217328
> #regzbot link: 
> https://lore.kernel.org/all/20230412095509.2196162-1-cy...@debamax.com/T/#m34493480243a2cad2ae359abfd9db5e755f41add
> #regzbot link: https://bugs.debian.org/1033058

No reply to my status inquiry[1] a few weeks ago, so I have to assume
nobody cares anymore. If somebody still cares, holler!

#regzbot inconclusive: no answer to a status inquiry
#regzbot ignore-activity

[1]
https://lore.kernel.org/lkml/d1aee7d3-05f6-0920-b8e1-4ed5cf3f9...@leemhuis.info/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: [PATCH 2/2] drm/ofdrm: Update expected device name

2023-05-22 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Was a proper solution for the regression the initial mail in this thread
is about ever found? Doesn't look like it for here, but maybe I'm
missing something.

Reminder, the problem afaik is caused by 241d2fb56a ("of: Make OF
framebuffer device names unique") [merged for v6.2-rc8, authored by
Michal Suchanek; committed by Rob Herring].

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 24.04.23 11:35, Helge Deller wrote:
> On 4/24/23 11:07, Thomas Zimmermann wrote:
>> Am 24.04.23 um 09:33 schrieb Geert Uytterhoeven:
>>> On Wed, Apr 12, 2023 at 12:05 PM Cyril Brulebois 
>>> wrote:
 Since commit 241d2fb56a18 ("of: Make OF framebuffer device names
 unique"),
 as spotted by Frédéric Bonnard, the historical "of-display" device is
 gone: the updated logic creates "of-display.0" instead, then as many
 "of-display.N" as required.

 This means that offb no longer finds the expected device, which
 prevents
 the Debian Installer from setting up its interface, at least on
 ppc64el.

 Given the code similarity it is likely to affect ofdrm in the same way.

 It might be better to iterate on all possible nodes, but updating the
 hardcoded device from "of-display" to "of-display.0" is likely to help
 as a first step.

 Link: https://bugzilla.kernel.org/show_bug.cgi?id=217328
 Link: https://bugs.debian.org/1033058
 Fixes: 241d2fb56a18 ("of: Make OF framebuffer device names unique")
 Cc: sta...@vger.kernel.org # v6.2+
 Signed-off-by: Cyril Brulebois 
>>>
>>> Thanks for your patch, which is now commit 3a9d8ea2539ebebd
>>> ("drm/ofdrm: Update expected device name") in fbdev/for-next.
>>>
 --- a/drivers/gpu/drm/tiny/ofdrm.c
 +++ b/drivers/gpu/drm/tiny/ofdrm.c
 @@ -1390,7 +1390,7 @@ MODULE_DEVICE_TABLE(of, ofdrm_of_match_display);

   static struct platform_driver ofdrm_platform_driver = {
  .driver = {
 -   .name = "of-display",
 +   .name = "of-display.0",
  .of_match_table = ofdrm_of_match_display,
  },
  .probe = ofdrm_probe,
>>>
>>> Same comment as for "[PATCH 1/2] fbdev/offb: Update expected device
>>> name".
>>>
>>> https://lore.kernel.org/r/camuhmdvgeeasmb4tauuqqgj-4+bbetwewyja+m9nyjv0bj_...@mail.gmail.com
>>
>> Sorry that I missed this patch. I agree that it's probably not
>> correct. At least in ofdrm, we want to be able to use multiple
>> framebuffers at the same time; a feature that has been broken by this
>> change.
> 
> Geert & Thomas, thanks for the review!
> 
> I've dropped both patches from fbdev tree for now.
> Would be great to find another good solution though, as it breaks the
> debian
> installer.
> 
> Helge


Re: [PATCH] drm/probe_helper: fix the warning reported when calling drm_kms_helper_poll_disable during suspend

2023-05-17 Thread Linux regression tracking (Thorsten Leemhuis)
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Dmitry, was any progress made to address this regression? Doesn't look
like it, but I strongly suspect I'm missing something, as I'm not really
sure if I properly understood this thread. It sounded a bit like
a4e771729a51 should be reverted for now until all
drm_kms_helper_poll_disable() calls have been verified. Is that right?
Or did somebody already verify and fix all of them with bugs?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 28.04.23 03:17, zongmin zhou wrote:
> On Wed, 2023-04-26 at 16:10 +0300, Dmitry Baryshkov wrote:
>> On Wed, 26 Apr 2023 at 12:09, zongmin zhou 
>> wrote:
>>> On Sun, 2023-04-23 at 22:51 +0200, Janne Grunau wrote:
 On 2023-04-20 23:07:01 +0300, Dmitry Baryshkov wrote:
> On Thu, 20 Apr 2023 at 23:01, Janne Grunau 
> wrote:
>>
>> On 2023-03-28 10:31:29 +0800, Zongmin Zhou wrote:
>>> When drivers call drm_kms_helper_poll_disable from
>>> their device suspend implementation without enabled output
>>> polling before,
>>> following warning will be reported,due to work->func not be
>>> initialized:
>>
>> we see the same warning with the wpork in progress kms driver
>> for
>> apple
>> silicon SoCs. The connectors do not need to polled so the
>> driver
>> never
>> calls drm_kms_helper_poll_init().
>>
>>> [   55.141361] WARNING: CPU: 3 PID: 372 at
>>> kernel/workqueue.c:3066 __flush_work+0x22f/0x240
>>> [   55.141382] Modules linked in: nls_iso8859_1
>>> snd_hda_codec_generic ledtrig_audio snd_hda_intel
>>> snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec
>>> snd_hda_core
>>> snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event
>>> snd_rawmidi
>>> snd_seq intel_rapl_msr intel_rapl_common bochs
>>> drm_vram_helper
>>> drm_ttm_helper snd_seq_device nfit ttm crct10dif_pclmul
>>> snd_timer ghash_clmulni_intel binfmt_misc sha512_ssse3
>>> aesni_intel drm_kms_helper joydev input_leds syscopyarea
>>> crypto_simd snd cryptd sysfillrect sysimgblt mac_hid
>>> serio_raw
>>> soundcore qemu_fw_cfg sch_fq_codel msr parport_pc ppdev lp
>>> parport drm ramoops reed_solomon pstore_blk pstore_zone
>>> efi_pstore virtio_rng ip_tables x_tables autofs4
>>> hid_generic
>>> usbhid hid ahci virtio_net i2c_i801 crc32_pclmul psmouse
>>> virtio_scsi libahci i2c_smbus lpc_ich xhci_pci net_failover
>>> virtio_blk xhci_pci_renesas failover
>>> [   55.141430] CPU: 3 PID: 372 Comm: kworker/u16:9 Not
>>> tainted
>>> 6.2.0-rc6+ #16
>>> [   55.141433] Hardware name: QEMU Standard PC (Q35 + ICH9,
>>> 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org
>>> 04/01/2014
>>> [   55.141435] Workqueue: events_unbound async_run_entry_fn
>>> [   55.141441] RIP: 0010:__flush_work+0x22f/0x240
>>> [   55.141444] Code: 8b 43 28 48 8b 53 30 89 c1 e9 f9 fe ff
>>> ff
>>> 4c 89 f7 e8 b5 95 d9 00 e8 00 53 08 00 45 31 ff e9 11 ff ff
>>> ff
>>> 0f 0b e9 0a ff ff ff <0f> 0b 45 31 ff e9 00 ff ff ff e8 e2
>>> 54
>>> d8 00 66 90 90 90 90 90 90
>>> [   55.141446] RSP: 0018:ff59221940833c18 EFLAGS: 00010246
>>> [   55.141449] RAX:  RBX: 
>>> RCX:
>>> 9b72bcbe
>>> [   55.141450] RDX: 0001 RSI: 0001
>>> RDI:
>>> ff3ea01e4265e330
>>> [   55.141451] RBP: ff59221940833c90 R08: 
>>> R09:
>>> 8080808080808080
>>> [   55.141453] R10: ff3ea01e42b3caf4 R11: 000f
>>> R12:
>>> ff3ea01e4265e330
>>> [   55.141454] R13: 0001 R14: ff3ea01e505e5e80
>>> R15:
>>> 0001
>>> [   55.141455] FS:  ()
>>> GS:ff3ea01fb7cc() knlGS:
>>> [   55.141456] CS:  0010 DS:  ES:  CR0:
>>> 80050033
>>> [   55.141458] CR2: 563543ad1546 CR3: 00010ee82005
>>> CR4:
>>> 00771ee0
>>> [   55.141464] DR0:  DR1: 
>>> DR2:
>>> 
>>> [   55.141465] DR3:  DR6: fffe0ff0
>>> DR7:
>>> 0400
>>> [   55.141466] PKRU: 5554
>>> [   55.141467] Call Trace:
>>> [   55.141469]  
>>> [   55.141472]  ? pcie_wait_cmd+0xdf/0x220
>>> [   55.141478]  ? mptcp_seq_show+0xe0/0x180
>>> [   55.141484]  __cancel_work_timer+0x124/0x1b0
>>> [   55.141487]  cancel_delayed_work_sync+0x17/0x20
>>> [   55.141490]  drm_kms_helper_poll_disable+0x26/0x40
>>> [drm_kms_helper]
>>> [   55.141516]  

Re: Fwd: Kernel 5.11 crashes when it boots, it produces black screen.

2023-05-10 Thread Linux regression tracking (Thorsten Leemhuis)
Hi!

On 10.05.23 10:26, Bagas Sanjaya wrote:
> 
> I noticed a regression report on Bugzilla ([1]). As many developers don't
> have a look on it, I decided to forward it by email. See the report
> for the full thread.
> 
> Quoting from the report:
> 
>>  Azamat S. Kalimoulline 2021-04-06 15:45:08 UTC
>>
>> Same as in https://bugzilla.kernel.org/show_bug.cgi?id=212133, but not 
>> StoneyRidge related. I have same issue in 5.11.9 kernel, but on Renoir 
>> architecture. I have AMD Ryzen 5 PRO 4650U with Radeon Graphics. Same stuck 
>> on loading initial ramdisk. modprobe.blacklist=amdgpu 3` didn't help to 
>> boot. Same stuck. Also iommu=off and acpi=off too. 5.10.26 boots fine. I 
>> boot via efi and I have no option boot without it.
> 
> Azamat, can you try reproducing this issue on latest mainline?
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=212579

Bagas, thx for all your help with regression tracking, much appreciated
(side note, as I'm curious for a while already: what is your motivation?
Just want to help? But whatever, any help is great!).

That being said: I'm not sure if I like what you did in this particular
case, as developers might start getting annoyed by regression tracking
if we throw too many bug reports of lesser quality before their feet --
and then they might start to ignore us, which we really need to prevent.

That's why I would not have forwarded that report at this point of time,
mainly for these reasons:

 * The initial report is quite old already, as it fall through the
cracks (not good, but happens; sorry Azamat!). Hence in this case it
would definitely be better to *first* ask the reporter to check if the
problem still happens with latest mainline (or at least latest stable)
before involving the kernel developers, as it might have been fixed
already.

 * This might not be a amdgpu bug at all; in fact the other bug the
reporter mentioned was an iommu thing. Hence this might be one of those
regressions where a bisection is the only way to get down to the
problem. Sure, sending a few developers a quick inquiry along the lines
of "do you maybe have an idea what's up there" is fine, but that's not
what you did in your mail. Your list of recipients is also quite long;
that's risky: if you do that too often, as then they might start
ignoring mail from you.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)
On 02.05.23 15:48, Felix Richter wrote:
> On 5/2/23 15:34, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 02.05.23 15:13, Alex Deucher wrote:
>>> On Tue, May 2, 2023 at 7:45 AM Linux regression tracking (Thorsten
>>> Leemhuis)  wrote:
>>>
>>>> On 30.04.23 13:44, Felix Richter wrote:
>>>>> Hi,
>>>>>
>>>>> I am running into an issue with the integrated GPU of the Ryzen 9
>>>>> 7950X. It seems to be a regression from kernel version 6.1 to 6.2.
>>>>> The bug materializes in from of my monitor blinking, meaning it
>>>>> turns full white shortly. This happens very often so that the
>>>>> system becomes unpleasant to use.
>>>>>
>>>>> I am running the Archlinux Kernel:
>>>>> The Issue happens on the bleeding edge kernel: 6.2.13
>>>>> Switching back to the LTS kernel resolves the issue: 6.1.26
>>>>>
>>>>> I have two monitors attached to the system. One 42 inch 4k Display
>>>>> and a 24 inch 1080p Display and am running sway as my desktop.
>>>>>
>>>>> Let me know if there is more information I could provide to help
>>>>> narrow down the issue.
>>>> Thanks for the report. To be sure the issue doesn't fall through the
>>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>>>> tracking bot:
>>>>
>>>> #regzbot ^introduced v6.1..v6.2
>>>> #regzbot title drm: amdgpu: system becomes unpleasant to use after
>>>> monitor starts blinking and turns full white
>>>> #regzbot ignore-activity
>>>>
>>>> This isn't a regression? This issue or a fix for it are already
>>>> discussed somewhere else? It was fixed already? You want to clarify
>>>> when
>>>> the regression started to happen? Or point out I got the title or
>>>> something else totally wrong? Then just reply and tell me -- ideally
>>>> while also telling regzbot about it, as explained by the page listed in
>>>> the footer of this mail.
>>>>
>>>> Developers: When fixing the issue, remember to add 'Link:' tags
>>>> pointing
>>>> to the report (the parent of this mail). See page linked in footer for
>>>> details.
>>> This sounds exactly like the issue that was fixed in this patch which
>>> is already on it's way to Linus:
>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/08da182175db4c7f80850354849d95f2670e8cd9
>> FWIW, you in the flood of emails likely missed that this is the same
>> thread where you yesterday replied "If the module parameter didn't help
>> then perhaps you are seeing some other issue.  Can you bisect?". That's
>> why I decided to add this to the tracking. Or am I missing something
>> obvious here?
>>
>> /me looks around again and can't see anything, but that doesn't have to
>> mean anything...
>>
>> Felix, btw, this guide might help you with the bisection, even if it's
>> just for kernel compilation:
>>
>> https://docs.kernel.org/next/admin-guide/quickly-build-trimmed-linux.html
>>
>> And to indirectly reply to your mail from yesterday[1]. You might want
>> to ignore the arch linux kernel git repo and just do a bisection between
>> 6.1 and the latest 6.2.y kernel using upstream repos; and if I were you
>> I'd also try 6.3 or even mainline before that, in case the issue was
>> fixed already.
>>
>> [1]
>> https://lore.kernel.org/all/04749ee4-0728-92fe-bcb0-a7320279e...@felixrichter.tech/
>>
> Thanks for the pointers, I'll do a bisection on my desktop from 6.1 to
> the newest commit.

FWIW, I wonder what you actually mean with "newest commit" here: a
bisection between 6.1 and mainline HEAD might be a waste of time, *if*
this is something that only happens in 6.2.y (say due to a broken or
incomplete backport)

> That was the part I was mostly unsure about … where
> to start from.
> 
> I was planning to use PKGBUILD scripts from arch to achieve the same
> configuration as I would when installing
> the package and just rewrite the script to use a local copy of the
> source code instead of the repository.
> That way I can just use the bisect command, rebuild the package and test
> again.

In my experience trying to deal with Linux distro's package managers
creates more trouble than it's worth.

> But I probably won't be able to finish it this week, since I am on
> vacation starting tomorrow and will not have access to the computer in
> question. I will be back next week, by that time the patch Alex is
> talking about might
> already be in mainline. So if that fixes it, I will notice and let you
> know. If not I will do the bisection to figure out what the actual issue
> is.

Enjoy your vacation!

Ciao, Thorsten


Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)
On 02.05.23 15:13, Alex Deucher wrote:
> On Tue, May 2, 2023 at 7:45 AM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>
>> On 30.04.23 13:44, Felix Richter wrote:
>>> Hi,
>>>
>>> I am running into an issue with the integrated GPU of the Ryzen 9 7950X. It 
>>> seems to be a regression from kernel version 6.1 to 6.2.
>>> The bug materializes in from of my monitor blinking, meaning it turns full 
>>> white shortly. This happens very often so that the system becomes 
>>> unpleasant to use.
>>>
>>> I am running the Archlinux Kernel:
>>> The Issue happens on the bleeding edge kernel: 6.2.13
>>> Switching back to the LTS kernel resolves the issue: 6.1.26
>>>
>>> I have two monitors attached to the system. One 42 inch 4k Display and a 24 
>>> inch 1080p Display and am running sway as my desktop.
>>>
>>> Let me know if there is more information I could provide to help narrow 
>>> down the issue.
>>
>> Thanks for the report. To be sure the issue doesn't fall through the
>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
>> tracking bot:
>>
>> #regzbot ^introduced v6.1..v6.2
>> #regzbot title drm: amdgpu: system becomes unpleasant to use after
>> monitor starts blinking and turns full white
>> #regzbot ignore-activity
>>
>> This isn't a regression? This issue or a fix for it are already
>> discussed somewhere else? It was fixed already? You want to clarify when
>> the regression started to happen? Or point out I got the title or
>> something else totally wrong? Then just reply and tell me -- ideally
>> while also telling regzbot about it, as explained by the page listed in
>> the footer of this mail.
>>
>> Developers: When fixing the issue, remember to add 'Link:' tags pointing
>> to the report (the parent of this mail). See page linked in footer for
>> details.
> 
> This sounds exactly like the issue that was fixed in this patch which
> is already on it's way to Linus:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/08da182175db4c7f80850354849d95f2670e8cd9

FWIW, you in the flood of emails likely missed that this is the same
thread where you yesterday replied "If the module parameter didn't help
then perhaps you are seeing some other issue.  Can you bisect?". That's
why I decided to add this to the tracking. Or am I missing something
obvious here?

/me looks around again and can't see anything, but that doesn't have to
mean anything...

Felix, btw, this guide might help you with the bisection, even if it's
just for kernel compilation:

https://docs.kernel.org/next/admin-guide/quickly-build-trimmed-linux.html

And to indirectly reply to your mail from yesterday[1]. You might want
to ignore the arch linux kernel git repo and just do a bisection between
6.1 and the latest 6.2.y kernel using upstream repos; and if I were you
I'd also try 6.3 or even mainline before that, in case the issue was
fixed already.

[1]
https://lore.kernel.org/all/04749ee4-0728-92fe-bcb0-a7320279e...@felixrichter.tech/

Ciao, Thorsten


Re: PROBLEM: AMD Ryzen 9 7950X iGPU - Blinking Issue

2023-05-02 Thread Linux regression tracking (Thorsten Leemhuis)
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 30.04.23 13:44, Felix Richter wrote:
> Hi,
> 
> I am running into an issue with the integrated GPU of the Ryzen 9 7950X. It 
> seems to be a regression from kernel version 6.1 to 6.2. 
> The bug materializes in from of my monitor blinking, meaning it turns full 
> white shortly. This happens very often so that the system becomes unpleasant 
> to use.
> 
> I am running the Archlinux Kernel:
> The Issue happens on the bleeding edge kernel: 6.2.13
> Switching back to the LTS kernel resolves the issue: 6.1.26
> 
> I have two monitors attached to the system. One 42 inch 4k Display and a 24 
> inch 1080p Display and am running sway as my desktop.
> 
> Let me know if there is more information I could provide to help narrow down 
> the issue.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced v6.1..v6.2
#regzbot title drm: amdgpu: system becomes unpleasant to use after
monitor starts blinking and turns full white
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


Re: [PATCH v3] firmware/sysfb: Fix VESA format selection

2023-04-21 Thread Linux regression tracking (Thorsten Leemhuis)
On 20.04.23 17:57, Pierre Asselin wrote:
> Some legacy BIOSes report no reserved bits in their 32-bit rgb mode,
> breaking the calculation of bits_per_pixel in commit f35cd3fa7729
> ("firmware/sysfb: Fix EFI/VESA format selection").  However they report
> lfb_depth correctly for those modes.  Keep the computation but
> set bits_per_pixel to lfb_depth if the latter is larger.
> 
> v2 fixes the warnings from a max3() macro with arguments of different
> types;  split the bits_per_pixel assignment to avoid uglyfing the code
> with too many casts.
> 
> v3 fixes space and formatting blips pointed out by Javier, and change
> the bit_per_pixel assignment back to a single statement using two casts.
> 
> Link: https://lore.kernel.org/r/4psm6b6lqkz1...@panix3.panix.com
> Link: https://lore.kernel.org/r/20230412150225.3757223-1-javi...@redhat.com
> Link: 
> https://lore.kernel.org/dri-devel/20230418183325.2327-1...@panix.com/T/#u
> Link: 
> https://lore.kernel.org/dri-devel/20230419044834.10816-1...@panix.com/T/#u
> Fixes: f35cd3fa7729 ("firmware/sysfb: Fix EFI/VESA format selection")
> Signed-off-by: Pierre Asselin 

Linus might release the final this weekend and this is among the last
few 6.3 regressions I track. Hence please allow me to ask:

Pierre, Tomas, Javier, et. al: how many "legacy BIOSes" do we suspect
are affected by this? So many that it might be worth delaying the
release by one week? And in case everybody involved might agree that
this patch is ready by today or tomorrow: might it be worth asking Linus
to merge this patch directly[1]?

[FWIW, I highly suspect the answer to the last two questions is "no,
that's definitely not worth is", just wanted to confirm]

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

[1] yes, that's a thing we do:
https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/


Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-03-12 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.03.23 11:20, Karol Herbst wrote:
> On Fri, Mar 10, 2023 at 10:26 AM Chris Clayton  
> wrote:
>>
>> Is it likely that this fix will be sumbmitted to mainline during the ongoing 
>> 6.3 development cycle?
>>
> 
> yes, it's already pushed to drm-misc-fixed, which then will go into
> the current devel cycle. I just don't know when it's the next time it
> will be pushed upwards, but it should get there eventually. 

FWIW, the fix landed now as 1b9b4f922f96 ; sadly without a Link: tag to
the report, hence I have to mark this manually as resolved:

#regzbot fix: 1b9b4f922f96108da3bb5d87b2d603f5dfbc5650

> And
> because it also contains a Fixes tag it will be backported to older
> branches as well.

FWIW, nope, that's not enough you have to tag those explicitly to ensure
backporting, as explained in
Documentation/process/stable-kernel-rules.rst Greg points that out every
few weeks, recently here for example:

https://lore.kernel.org/all/y6bwpo9s9qbns...@kroah.com/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

>> Chris
>>
>> On 20/02/2023 22:16, Ben Skeggs wrote:
>>> On Mon, 20 Feb 2023 at 21:27, Karol Herbst  wrote:
>>>>
>>>> On Mon, Feb 20, 2023 at 11:51 AM Chris Clayton  
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 20/02/2023 05:35, Ben Skeggs wrote:
>>>>>> On Sun, 19 Feb 2023 at 04:55, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 18/02/2023 15:19, Chris Clayton wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18/02/2023 12:25, Karol Herbst wrote:
>>>>>>>>> On Sat, Feb 18, 2023 at 1:22 PM Chris Clayton 
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 15/02/2023 11:09, Karol Herbst wrote:
>>>>>>>>>>> On Wed, Feb 15, 2023 at 11:36 AM Linux regression tracking #update
>>>>>>>>>>> (Thorsten Leemhuis)  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 13.02.23 10:14, Chris Clayton wrote:
>>>>>>>>>>>>> On 13/02/2023 02:57, Dave Airlie wrote:
>>>>>>>>>>>>>> On Sun, 12 Feb 2023 at 00:43, Chris Clayton 
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/02/2023 19:33, Linux regression tracking (Thorsten 
>>>>>>>>>>>>>>> Leemhuis) wrote:
>>>>>>>>>>>>>>>> On 10.02.23 20:01, Karol Herbst wrote:
>>>>>>>>>>>>>>>>> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking 
>>>>>>>>>>>>>>>>> (Thorsten
>>>>>>>>>>>>>>>>> Leemhuis)  wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 08.02.23 09:48, Chris Clayton wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm assuming  that we are not going to see a fix for this 
>>>>>>>>>>>>>>>>>>> regression before 6.2 is released.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yeah, looks like it. That's unfortunate, but happens. But 
>>>>>>>>>>>>>>>>>> there is still
>>>>>>>>>>>>>>>>>> time to fix it and there is one thing I wonder:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Did any of the nouveau developers look at the netconsole 
>>>>>>>>>>>>>>>>>> captures Chris
>>>>&

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 10.02.23 20:01, Karol Herbst wrote:
> On Fri, Feb 10, 2023 at 7:35 PM Linux regression tracking (Thorsten
> Leemhuis)  wrote:
>>
>> On 08.02.23 09:48, Chris Clayton wrote:
>>>
>>> I'm assuming  that we are not going to see a fix for this regression before 
>>> 6.2 is released.
>>
>> Yeah, looks like it. That's unfortunate, but happens. But there is still
>> time to fix it and there is one thing I wonder:
>>
>> Did any of the nouveau developers look at the netconsole captures Chris
>> posted more than a week ago to check if they somehow help to track down
>> the root of this problem?
> 
> I did now and I can't spot anything. I think at this point it would
> make sense to dump the active tasks/threads via sqsrq keys to see if
> any is in a weird state preventing the machine from shutting down.

Many thx for looking into it!

Ciao, Thorsten

>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>> --
>> Everything you wanna know about Linux kernel regression tracking:
>> https://linux-regtracking.leemhuis.info/about/#tldr
>> If I did something stupid, please tell me, as explained on that page.
>>
>>> Consequently, I've
>>> implemented a (very simple) workaround. All that happens is that in the 
>>> (sysv) init script that starts and stops SDDM,
>>> the nouveau module is removed once SDDM is stopped. With that in place, my 
>>> system no longer freezes on reboot or poweroff.
>>>
>>> Let me know if I can provide any additional diagnostics although, with the 
>>> problem seemingly occurring so late in the
>>> shutdown process, I may need help on how to go about capturing.
>>>
>>> Chris
>>>
>>> On 02/02/2023 20:45, Chris Clayton wrote:
>>>>
>>>>
>>>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>>>
>>>>>
>>>>> On 30/01/2023 23:27, Ben Skeggs wrote:
>>>>>> On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi again.
>>>>>>>
>>>>>>> On 30/01/2023 20:19, Chris Clayton wrote:
>>>>>>>> Thanks, Ben.
>>>>>>>
>>>>>>> 
>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>>>>>>>> *any* of my boards.  Could you try the attached patch please?
>>>>>>>>
>>>>>>>> Unfortunately, the patch made no difference.
>>>>>>>>
>>>>>>>> I've been looking at how the graphics on my laptop is set up, and have 
>>>>>>>> a bit of a worry about whether the firmware might
>>>>>>>> be playing a part in this problem. In order to offload video decoding 
>>>>>>>> to the NVidia TU117 GPU, it seems the scrubber
>>>>>>>> firmware must be available, but as far as I know,that has not been 
>>>>>>>> released by NVidia. To get it to work, I followed
>>>>>>>> what ubuntu have done and the scrubber in 
>>>>>>>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>>>>>>>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of 
>>>>>>>> the firmware loaded is for a different card is being
>>>>>>>> loaded. I note that processing related to firmware is being changed in 
>>>>>>>> the patch. Might my set up be at the root of my
>>>>>>>> problem?
>>>>>>>>
>>>>>>>> I'll have a fiddle an see what I can work out.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ben.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> Well, my fiddling has got my system rebooting and shutting down 
>>>>>>> successfully again. I found that if I delete the symlink
>>>>>>> to the scrubber firmware, reboot and shutdown work again. There are 
>>>>>>> however, a number of other files in the tu117
>>>>>>> firmware directory tree that that are symlinks to actual files in its 
&g

Re: linux-6.2-rc4+ hangs on poweroff/reboot: Bisected

2023-02-10 Thread Linux regression tracking (Thorsten Leemhuis)
On 08.02.23 09:48, Chris Clayton wrote:
> 
> I'm assuming  that we are not going to see a fix for this regression before 
> 6.2 is released.

Yeah, looks like it. That's unfortunate, but happens. But there is still
time to fix it and there is one thing I wonder:

Did any of the nouveau developers look at the netconsole captures Chris
posted more than a week ago to check if they somehow help to track down
the root of this problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

> Consequently, I've
> implemented a (very simple) workaround. All that happens is that in the 
> (sysv) init script that starts and stops SDDM,
> the nouveau module is removed once SDDM is stopped. With that in place, my 
> system no longer freezes on reboot or poweroff.
> 
> Let me know if I can provide any additional diagnostics although, with the 
> problem seemingly occurring so late in the
> shutdown process, I may need help on how to go about capturing.
> 
> Chris
> 
> On 02/02/2023 20:45, Chris Clayton wrote:
>>
>>
>> On 01/02/2023 13:51, Chris Clayton wrote:
>>>
>>>
>>> On 30/01/2023 23:27, Ben Skeggs wrote:
 On Tue, 31 Jan 2023 at 09:09, Chris Clayton  
 wrote:
>
> Hi again.
>
> On 30/01/2023 20:19, Chris Clayton wrote:
>> Thanks, Ben.
>
> 
>
>>> Hey,
>>>
>>> This is a complete shot-in-the-dark, as I don't see this behaviour on
>>> *any* of my boards.  Could you try the attached patch please?
>>
>> Unfortunately, the patch made no difference.
>>
>> I've been looking at how the graphics on my laptop is set up, and have a 
>> bit of a worry about whether the firmware might
>> be playing a part in this problem. In order to offload video decoding to 
>> the NVidia TU117 GPU, it seems the scrubber
>> firmware must be available, but as far as I know,that has not been 
>> released by NVidia. To get it to work, I followed
>> what ubuntu have done and the scrubber in 
>> /lib/firmware/nvidia/tu117/nvdec/ is a symlink to
>> ../../tu116/nvdev/scrubber.bin. That, of course, means that some of the 
>> firmware loaded is for a different card is being
>> loaded. I note that processing related to firmware is being changed in 
>> the patch. Might my set up be at the root of my
>> problem?
>>
>> I'll have a fiddle an see what I can work out.
>>
>> Chris
>>
>>>
>>> Thanks,
>>> Ben.
>>>

>
> Well, my fiddling has got my system rebooting and shutting down 
> successfully again. I found that if I delete the symlink
> to the scrubber firmware, reboot and shutdown work again. There are 
> however, a number of other files in the tu117
> firmware directory tree that that are symlinks to actual files in its 
> tu116 counterpart. So I deleted all of those too.
> Unfortunately, the absence of one or more of those symlinks causes Xorg 
> to fail to start. I've reinstated all the links
> except scrubber and I now have a system that works as it did until I 
> tried to run a kernel that includes the bad commit
> I identified in my bisection. That includes offloading video decoding to 
> the NVidia card, so what ever I read that said
> the scrubber firmware was needed seems to have been wrong. I get a new 
> message that (nouveau :01:00.0: fb: VPR
> locked, but no scrubber binary!), but, hey, we can't have everything.
>
> If you still want to get to the bottom of this, let me know what you need 
> me to provide and I'll do my best. I suspect
> you might want to because there will a n awful lot of Ubuntu-based 
> systems out there with that scrubber.bin symlink in
> place. On the other hand,m it could but quite a while before ubuntu are 
> deploying 6.2 or later kernels.
 The symlinks are correct - whole groups of GPUs share the same FW, and
 we use symlinks in linux-firmware to represent this.

 I don't really have any ideas how/why this patch causes issues with
 shutdown - it's a path that only gets executed during initialisation.
 Can you try and capture the kernel log during shutdown ("dmesg -w"
 over ssh? netconsole?), and see if there's any relevant messages
 providing a hint at what's going on?  Alternatively, you could try
 unloading the module (you will have to stop X/wayland/gdm/etc/etc
 first) and seeing if that hangs too.

 Ben.
>>>
>>> Sorry for the delay - I've been learning about netconsole and netcat. 
>>> However, I had no success with ssh and netconsole
>>> produced a log with nothing unusual in it.
>>>
>>> Simply stopping Xorg and removing the nouveau module succeeds.
>>>
>>> So, I rebuilt rc6+ after a pull from linus'