Re: [PATCH v5 11/14] media: platform: mtk-mdp3: add mt8195 platform configuration

2023-09-12 Thread Krzysztof Kozlowski
On 13/09/2023 04:08, Moudy Ho (何宗原) wrote:

>> This does not make any sense and such mess at v5 is confusing.
>>
>> Best regards,
>> Krzysztof
>>
> 
> Hi Krzysztof,
> 
> 
> Apologize for the confusion caused by the improper separation of
> patches.
> This occurred because I misunderstood the new warning message
> "DT_SPLIT_BINDING_PATCH: DT binding docs and includes shoulde be a
> separate patch" that I received after running 'checkpatch.pl'.

Yes, separate patch. Patch.

Best regards,
Krzysztof



Re: [PATCH v5 2/3] dt-binding: mediatek: integrate MDP RDMA to one binding

2023-09-12 Thread Krzysztof Kozlowski
On 13/09/2023 05:04, Moudy Ho (何宗原) wrote:
> On Tue, 2023-09-12 at 10:16 +0200, Krzysztof Kozlowski wrote:
>>   
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>>  On 12/09/2023 09:56, Moudy Ho wrote:
>>> Due to the same hardware design, MDP RDMA needs to
>>> be integrated into the same binding.
>>>
>>
>> Please use subject prefixes matching the subsystem. You can get them
>> for
>> example with `git log --oneline -- DIRECTORY_OR_FILE` on the
>> directory
>> your patch is touching.
>>
>> This applies to entire patchset. It is not dt-binding, but dt-
>> bindings.
>>
>>> Signed-off-by: Moudy Ho 
>>> ---
>>>  .../display/mediatek/mediatek,mdp-rdma.yaml   | 88 -
>> --
>>>  .../bindings/media/mediatek,mdp3-rdma.yaml|  5 +-
>>>  2 files changed, 3 insertions(+), 90 deletions(-)
>>>  delete mode 100644
>> Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-
>> rdma.yaml
>>>
>>> diff --git
>> a/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-
>> rdma.yaml
>> b/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-
>> rdma.yaml
>>> deleted file mode 100644
>>> index dd12e2ff685c..
>>> ---
>> a/Documentation/devicetree/bindings/display/mediatek/mediatek,mdp-
>> rdma.yaml
>>> +++ /dev/null
>>> @@ -1,88 +0,0 @@
>>> -# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>>> -%YAML 1.2
>>> 
>>> -$id: 
>> http://devicetree.org/schemas/display/mediatek/mediatek,mdp-rdma.yaml#
>>> -$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> -
>>> -title: MediaTek MDP RDMA
>>> -
>>> -maintainers:
>>> -  - Chun-Kuang Hu 
>>> -  - Philipp Zabel 
>>> -
>>> -description:
>>> -  The MediaTek MDP RDMA stands for Read Direct Memory Access.
>>> -  It provides real time data to the back-end panel driver, such as
>> DSI,
>>> -  DPI and DP_INTF.
>>> -  It contains one line buffer to store the sufficient pixel data.
>>> -  RDMA device node must be siblings to the central MMSYS_CONFIG
>> node.
>>> -  For a description of the MMSYS_CONFIG binding, see
>>>
>> -  Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml
>> for details.
>>> -
>>> -properties:
>>> -  compatible:
>>> -const: mediatek,mt8195-vdo1-rdma
>>> -
>>> -  reg:
>>> -maxItems: 1
>>> -
>>> -  interrupts:
>>> -maxItems: 1
>>> -
>>> -  power-domains:
>>> -maxItems: 1
>>> -
>>> -  clocks:
>>> -items:
>>> -  - description: RDMA Clock
>>
>> This is different and you did not explain it in commit msg.
>>
>> Another difference - mboxes. Looks like you did not test your DTS...
>>
>> Best regards,
>> Krzysztof
>>
> Hi Krzysztof,
> 
> Sorry for the inconvenience.
> The property you mentioned was removed on [3/3]. This incorrect
> configuration went unnoticed because I passed the test with the entire
> series.
> It will be recified in the next version.

Please describe any differences (lost properties etc) in commit msg with
some explanation.

Best regards,
Krzysztof



Re: [PATCH v4 3/5] arch/powerpc: Remove trailing whitespaces

2023-09-12 Thread Philippe Mathieu-Daudé

On 12/9/23 15:49, Thomas Zimmermann wrote:

Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
  arch/powerpc/include/asm/machdep.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 



RE: [PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp prescision

2023-09-12 Thread Kandpal, Suraj
> Subject: [PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp
> prescision
> 
> From: Ankit Nautiyal 
> 
> Add helper to get the DSC bits_per_pixel precision for the DP sink.
> 

I think you forgot to add my reviewed by that I gave in the last revision 😝
Anyways,

LGTM.

Reviewed-by: Suraj Kandpal 

> Signed-off-by: Ankit Nautiyal 
> ---
>  drivers/gpu/drm/display/drm_dp_helper.c | 27 +
>  include/drm/display/drm_dp_helper.h |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/display/drm_dp_helper.c
> b/drivers/gpu/drm/display/drm_dp_helper.c
> index 8a1b64c57dfd..5c23d5b8fc50 100644
> --- a/drivers/gpu/drm/display/drm_dp_helper.c
> +++ b/drivers/gpu/drm/display/drm_dp_helper.c
> @@ -2323,6 +2323,33 @@ int drm_dp_read_desc(struct drm_dp_aux *aux,
> struct drm_dp_desc *desc,  }  EXPORT_SYMBOL(drm_dp_read_desc);
> 
> +/**
> + * drm_dp_dsc_sink_bpp_incr() - Get bits per pixel increment
> + * @dsc_dpcd: DSC capabilities from DPCD
> + *
> + * Returns the bpp precision supported by the DP sink.
> + */
> +u8 drm_dp_dsc_sink_bpp_incr(const u8
> +dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE])
> +{
> + u8 bpp_increment_dpcd = dsc_dpcd[DP_DSC_BITS_PER_PIXEL_INC -
> +DP_DSC_SUPPORT];
> +
> + switch (bpp_increment_dpcd) {
> + case DP_DSC_BITS_PER_PIXEL_1_16:
> + return 16;
> + case DP_DSC_BITS_PER_PIXEL_1_8:
> + return 8;
> + case DP_DSC_BITS_PER_PIXEL_1_4:
> + return 4;
> + case DP_DSC_BITS_PER_PIXEL_1_2:
> + return 2;
> + case DP_DSC_BITS_PER_PIXEL_1_1:
> + return 1;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_dp_dsc_sink_bpp_incr);
> +
>  /**
>   * drm_dp_dsc_sink_max_slice_count() - Get the max slice count
>   * supported by the DSC sink.
> diff --git a/include/drm/display/drm_dp_helper.h
> b/include/drm/display/drm_dp_helper.h
> index 3369104e2d25..6968d4d87931 100644
> --- a/include/drm/display/drm_dp_helper.h
> +++ b/include/drm/display/drm_dp_helper.h
> @@ -164,6 +164,7 @@ drm_dp_is_branch(const u8
> dpcd[DP_RECEIVER_CAP_SIZE])  }
> 
>  /* DP/eDP DSC support */
> +u8 drm_dp_dsc_sink_bpp_incr(const u8
> +dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
>  u8 drm_dp_dsc_sink_max_slice_count(const u8
> dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE],
>  bool is_edp);
>  u8 drm_dp_dsc_sink_line_buf_depth(const u8
> dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
> --
> 2.25.1



[PATCH 4/8] drm/i915/audio : Consider fractional vdsc bpp while computing tu_data

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

MTL+ supports fractional compressed bits_per_pixel, with precision of
1/16. This compressed bpp is stored in U6.4 format.
Accommodate the precision during calculation of transfer unit data
for hblank_early calculation.

v2:
-Fixed tu_data calculation while dealing with U6.4 format. (Stan)

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_audio.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_audio.c 
b/drivers/gpu/drm/i915/display/intel_audio.c
index 4f1db1581316..3b08be54ce4f 100644
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@@ -522,25 +522,25 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
unsigned int link_clks_available, link_clks_required;
unsigned int tu_data, tu_line, link_clks_active;
unsigned int h_active, h_total, hblank_delta, pixel_clk;
-   unsigned int fec_coeff, cdclk, vdsc_bpp;
+   unsigned int fec_coeff, cdclk, vdsc_bppx16;
unsigned int link_clk, lanes;
unsigned int hblank_rise;
 
h_active = crtc_state->hw.adjusted_mode.crtc_hdisplay;
h_total = crtc_state->hw.adjusted_mode.crtc_htotal;
pixel_clk = crtc_state->hw.adjusted_mode.crtc_clock;
-   vdsc_bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
+   vdsc_bppx16 = crtc_state->dsc.compressed_bpp_x16;
cdclk = i915->display.cdclk.hw.cdclk;
/* fec= 0.972261, using rounding multiplier of 100 */
fec_coeff = 972261;
link_clk = crtc_state->port_clock;
lanes = crtc_state->lane_count;
 
-   drm_dbg_kms(&i915->drm, "h_active = %u link_clk = %u :"
-   "lanes = %u vdsc_bpp = %u cdclk = %u\n",
-   h_active, link_clk, lanes, vdsc_bpp, cdclk);
+   drm_dbg_kms(&i915->drm,
+   "h_active = %u link_clk = %u : lanes = %u vdsc_bppx16 = %u 
cdclk = %u\n",
+   h_active, link_clk, lanes, vdsc_bppx16, cdclk);
 
-   if (WARN_ON(!link_clk || !pixel_clk || !lanes || !vdsc_bpp || !cdclk))
+   if (WARN_ON(!link_clk || !pixel_clk || !lanes || !vdsc_bppx16 || 
!cdclk))
return 0;
 
link_clks_available = (h_total - h_active) * link_clk / pixel_clk - 28;
@@ -552,8 +552,8 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
hblank_delta = DIV64_U64_ROUND_UP(mul_u32_u32(5 * (link_clk + 
cdclk), pixel_clk),
  mul_u32_u32(link_clk, cdclk));
 
-   tu_data = div64_u64(mul_u32_u32(pixel_clk * vdsc_bpp * 8, 100),
-   mul_u32_u32(link_clk * lanes, fec_coeff));
+   tu_data = div64_u64(mul_u32_u32(pixel_clk * vdsc_bppx16 * 8, 100),
+   mul_u32_u32(link_clk * lanes * 16, fec_coeff));
tu_line = div64_u64(h_active * mul_u32_u32(link_clk, fec_coeff),
mul_u32_u32(64 * pixel_clk, 100));
link_clks_active  = (tu_line - 1) * 64 + tu_data;
-- 
2.25.1



[PATCH 8/8] drm/i915/dsc: Allow DSC only with fractional bpp when forced from debugfs

2023-09-12 Thread Mitul Golani
From: Swati Sharma 

If force_dsc_fractional_bpp_en is set through debugfs allow DSC iff
compressed bpp is fractional. Continue if the computed compressed bpp
turns out to be a integer.

v2:
-Use helpers for fractional, integral bits of bits_per_pixel. (Suraj)
-Fix comment (Suraj)

Signed-off-by: Swati Sharma 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index d6c29006b816..354d78593a5f 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1906,6 +1906,9 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
for (compressed_bppx16 = dsc_max_bpp;
 compressed_bppx16 >= dsc_min_bpp;
 compressed_bppx16 -= bppx16_step) {
+   if (intel_dp->force_dsc_fractional_bpp_en &&
+   !intel_fractional_bpp_decimal(compressed_bppx16))
+   continue;
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
@@ -1913,6 +1916,10 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
  timeslots);
if (ret == 0) {
pipe_config->dsc.compressed_bpp_x16 = compressed_bppx16;
+   if (intel_dp->force_dsc_fractional_bpp_en &&
+   intel_fractional_bpp_decimal(compressed_bppx16))
+   drm_dbg_kms(&i915->drm, "Forcing DSC fractional 
bpp\n");
+
return 0;
}
}
-- 
2.25.1



[PATCH 5/8] drm/i915/dsc/mtl: Add support for fractional bpp

2023-09-12 Thread Mitul Golani
From: Vandita Kulkarni 

Consider the fractional bpp while reading the qp values.

v2: Use helpers for fractional, integral bits of bits_per_pixel. (Suraj)

Signed-off-by: Vandita Kulkarni 
Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 .../gpu/drm/i915/display/intel_qp_tables.c|  3 ---
 drivers/gpu/drm/i915/display/intel_vdsc.c | 25 +++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_qp_tables.c 
b/drivers/gpu/drm/i915/display/intel_qp_tables.c
index 543cdc46aa1d..600c815e37e4 100644
--- a/drivers/gpu/drm/i915/display/intel_qp_tables.c
+++ b/drivers/gpu/drm/i915/display/intel_qp_tables.c
@@ -34,9 +34,6 @@
  * These qp tables are as per the C model
  * and it has the rows pointing to bpps which increment
  * in steps of 0.5
- * We do not support fractional bpps as of today,
- * hence we would skip the fractional bpps during
- * our references for qp calclulations.
  */
 static const u8 
rc_range_minqp444_8bpc[DSC_NUM_BUF_RANGES][RC_RANGE_QP444_8BPC_MAX_NUM_BPP] = {
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
diff --git a/drivers/gpu/drm/i915/display/intel_vdsc.c 
b/drivers/gpu/drm/i915/display/intel_vdsc.c
index 1bd9391a6f5a..2c19078fbce8 100644
--- a/drivers/gpu/drm/i915/display/intel_vdsc.c
+++ b/drivers/gpu/drm/i915/display/intel_vdsc.c
@@ -78,8 +78,8 @@ intel_vdsc_set_min_max_qp(struct drm_dsc_config *vdsc_cfg, 
int buf,
 static void
 calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
 {
+   int bpp = intel_fractional_bpp_from_x16(vdsc_cfg->bits_per_pixel);
int bpc = vdsc_cfg->bits_per_component;
-   int bpp = vdsc_cfg->bits_per_pixel >> 4;
int qp_bpc_modifier = (bpc - 8) * 2;
int uncompressed_bpg_rate;
int first_line_bpg_offset;
@@ -149,7 +149,13 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
static const s8 ofs_und8[] = {
10, 8, 6, 4, 2, 0, -2, -4, -6, -8, -10, -10, -12, -12, 
-12
};
-
+   /*
+* For 420 format since bits_per_pixel (bpp) is set to target 
bpp * 2,
+* QP table values for target bpp 4.0 to 4.4375 (rounded to 
4.0) are
+* actually for bpp 8 to 8.875 (rounded to 4.0 * 2 i.e 8).
+* Similarly values for target bpp 4.5 to 4.8375 (rounded to 
4.5)
+* are for bpp 9 to 9.875 (rounded to 4.5 * 2 i.e 9), and so on.
+*/
bpp_i  = bpp - 8;
for (buf_i = 0; buf_i < DSC_NUM_BUF_RANGES; buf_i++) {
u8 range_bpg_offset;
@@ -179,6 +185,9 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
range_bpg_offset & DSC_RANGE_BPG_OFFSET_MASK;
}
} else {
+   /* fractional bpp part * 1 (for precision up to 4 decimal 
places) */
+   int fractional_bits = 
intel_fractional_bpp_decimal(vdsc_cfg->bits_per_pixel);
+
static const s8 ofs_und6[] = {
0, -2, -2, -4, -6, -6, -8, -8, -8, -10, -10, -12, -12, 
-12, -12
};
@@ -192,7 +201,14 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
10, 8, 6, 4, 2, 0, -2, -4, -6, -8, -10, -10, -12, -12, 
-12
};
 
-   bpp_i  = (2 * (bpp - 6));
+   /*
+* QP table rows have values in increment of 0.5.
+* So 6.0 bpp to 6.4375 will have index 0, 6.5 to 6.9375 will 
have index 1,
+* and so on.
+* 0.5 fractional part with 4 decimal precision becomes 5000
+*/
+   bpp_i  = ((bpp - 6) + (fractional_bits < 5000 ? 0 : 1));
+
for (buf_i = 0; buf_i < DSC_NUM_BUF_RANGES; buf_i++) {
u8 range_bpg_offset;
 
@@ -280,8 +296,7 @@ int intel_dsc_compute_params(struct intel_crtc_state 
*pipe_config)
/* Gen 11 does not support VBR */
vdsc_cfg->vbr_enable = false;
 
-   /* Gen 11 only supports integral values of bpp */
-   vdsc_cfg->bits_per_pixel = compressed_bpp << 4;
+   vdsc_cfg->bits_per_pixel = pipe_config->dsc.compressed_bpp_x16;
 
/*
 * According to DSC 1.2 specs in Section 4.1 if native_420 is set
-- 
2.25.1



[PATCH 7/8] drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp

2023-09-12 Thread Mitul Golani
From: Swati Sharma 

DSC_Sink_BPP_Precision entry is added to i915_dsc_fec_support_show
to depict sink's precision.
Also, new debugfs entry is created to enforce fractional bpp.
If Force_DSC_Fractional_BPP_en is set then while iterating over
output bpp with fractional step size we will continue if output_bpp is
computed as integer. With this approach, we will be able to validate
DSC with fractional bpp.

v2:
Add drm_modeset_unlock to new line(Suraj)

Signed-off-by: Swati Sharma 
Signed-off-by: Ankit Nautiyal 
Signed-off-by: Mitul Golani 
Reviewed-by: Suraj Kandpal 
---
 .../drm/i915/display/intel_display_debugfs.c  | 83 +++
 .../drm/i915/display/intel_display_types.h|  1 +
 2 files changed, 84 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c 
b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
index f05b52381a83..776ab96def1f 100644
--- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c
+++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
@@ -1244,6 +1244,8 @@ static int i915_dsc_fec_support_show(struct seq_file *m, 
void *data)
  
DP_DSC_YCbCr420_Native)),
   
str_yes_no(drm_dp_dsc_sink_supports_format(intel_dp->dsc_dpcd,
  
DP_DSC_YCbCr444)));
+   seq_printf(m, "DSC_Sink_BPP_Precision: %d\n",
+  drm_dp_dsc_sink_bpp_incr(intel_dp->dsc_dpcd));
seq_printf(m, "Force_DSC_Enable: %s\n",
   str_yes_no(intel_dp->force_dsc_en));
if (!intel_dp_is_edp(intel_dp))
@@ -1436,6 +1438,84 @@ static const struct file_operations 
i915_dsc_output_format_fops = {
.write = i915_dsc_output_format_write
 };
 
+static int i915_dsc_fractional_bpp_show(struct seq_file *m, void *data)
+{
+   struct drm_connector *connector = m->private;
+   struct drm_device *dev = connector->dev;
+   struct drm_crtc *crtc;
+   struct intel_dp *intel_dp;
+   struct intel_encoder *encoder = 
intel_attached_encoder(to_intel_connector(connector));
+   int ret;
+
+   if (!encoder)
+   return -ENODEV;
+
+   ret = 
drm_modeset_lock_single_interruptible(&dev->mode_config.connection_mutex);
+   if (ret)
+   return ret;
+
+   crtc = connector->state->crtc;
+   if (connector->status != connector_status_connected || !crtc) {
+   ret = -ENODEV;
+   goto out;
+   }
+
+   intel_dp = intel_attached_dp(to_intel_connector(connector));
+   seq_printf(m, "Force_DSC_Fractional_BPP_Enable: %s\n",
+  str_yes_no(intel_dp->force_dsc_fractional_bpp_en));
+
+out:
+   drm_modeset_unlock(&dev->mode_config.connection_mutex);
+
+   return ret;
+}
+
+static ssize_t i915_dsc_fractional_bpp_write(struct file *file,
+const char __user *ubuf,
+size_t len, loff_t *offp)
+{
+   struct drm_connector *connector =
+   ((struct seq_file *)file->private_data)->private;
+   struct intel_encoder *encoder = 
intel_attached_encoder(to_intel_connector(connector));
+   struct drm_i915_private *i915 = to_i915(encoder->base.dev);
+   struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
+   bool dsc_fractional_bpp_enable = false;
+   int ret;
+
+   if (len == 0)
+   return 0;
+
+   drm_dbg(&i915->drm,
+   "Copied %zu bytes from user to force fractional bpp for DSC\n", 
len);
+
+   ret = kstrtobool_from_user(ubuf, len, &dsc_fractional_bpp_enable);
+   if (ret < 0)
+   return ret;
+
+   drm_dbg(&i915->drm, "Got %s for DSC Fractional BPP Enable\n",
+   (dsc_fractional_bpp_enable) ? "true" : "false");
+   intel_dp->force_dsc_fractional_bpp_en = dsc_fractional_bpp_enable;
+
+   *offp += len;
+
+   return len;
+}
+
+static int i915_dsc_fractional_bpp_open(struct inode *inode,
+   struct file *file)
+{
+   return single_open(file, i915_dsc_fractional_bpp_show, 
inode->i_private);
+}
+
+static const struct file_operations i915_dsc_fractional_bpp_fops = {
+   .owner = THIS_MODULE,
+   .open = i915_dsc_fractional_bpp_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = single_release,
+   .write = i915_dsc_fractional_bpp_write
+};
+
 /*
  * Returns the Current CRTC's bpc.
  * Example usage: cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc
@@ -1513,6 +1593,9 @@ void intel_connector_debugfs_add(struct intel_connector 
*intel_connector)
 
debugfs_create_file("i915_dsc_output_format", 0644, root,
connector, &i915_dsc_output_format_fops);
+
+   debugfs_create_file("i915_dsc_fractional_bpp", 0644, root,
+  

[PATCH 2/8] drm/i915/display: Store compressed bpp in U6.4 format

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

DSC parameter bits_per_pixel is stored in U6.4 format.
The 4 bits represent the fractional part of the bpp.
Currently we use compressed_bpp member of dsc structure to store
only the integral part of the bits_per_pixel.
To store the full bits_per_pixel along with the fractional part,
compressed_bpp is changed to store bpp in U6.4 formats. Intergral
part is retrieved by simply right shifting the member compressed_bpp by 4.

v2:
-Use to_bpp_int, to_bpp_frac_dec, to_bpp_x16 helpers while dealing
 with compressed bpp. (Suraj)
-Fix comment styling. (Suraj)

v3:
-Add separate file for 6.4 fixed point helper(Jani, Nikula)
-Add comment for magic values(Suraj)

v4:
-Fix checkpatch caused due to renaming(Suraj)

Signed-off-by: Ankit Nautiyal 
Signed-off-by: Mitul Golani 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/icl_dsi.c| 11 +++---
 drivers/gpu/drm/i915/display/intel_audio.c|  3 +-
 drivers/gpu/drm/i915/display/intel_bios.c |  6 ++--
 drivers/gpu/drm/i915/display/intel_cdclk.c|  6 ++--
 drivers/gpu/drm/i915/display/intel_display.c  |  2 +-
 .../drm/i915/display/intel_display_types.h|  3 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 33 ++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   | 26 --
 .../i915/display/intel_fractional_helper.h| 36 +++
 drivers/gpu/drm/i915/display/intel_vdsc.c |  5 +--
 10 files changed, 93 insertions(+), 38 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/display/intel_fractional_helper.h

diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c 
b/drivers/gpu/drm/i915/display/icl_dsi.c
index ad6488e9c2b2..0f7594b6aa1f 100644
--- a/drivers/gpu/drm/i915/display/icl_dsi.c
+++ b/drivers/gpu/drm/i915/display/icl_dsi.c
@@ -43,6 +43,7 @@
 #include "intel_de.h"
 #include "intel_dsi.h"
 #include "intel_dsi_vbt.h"
+#include "intel_fractional_helper.h"
 #include "intel_panel.h"
 #include "intel_vdsc.h"
 #include "intel_vdsc_regs.h"
@@ -330,7 +331,7 @@ static int afe_clk(struct intel_encoder *encoder,
int bpp;
 
if (crtc_state->dsc.compression_enable)
-   bpp = crtc_state->dsc.compressed_bpp;
+   bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
else
bpp = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
@@ -860,7 +861,7 @@ gen11_dsi_set_transcoder_timings(struct intel_encoder 
*encoder,
 * compressed and non-compressed bpp.
 */
if (crtc_state->dsc.compression_enable) {
-   mul = crtc_state->dsc.compressed_bpp;
+   mul = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
div = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
}
 
@@ -884,7 +885,7 @@ gen11_dsi_set_transcoder_timings(struct intel_encoder 
*encoder,
int bpp, line_time_us, byte_clk_period_ns;
 
if (crtc_state->dsc.compression_enable)
-   bpp = crtc_state->dsc.compressed_bpp;
+   bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
else
bpp = 
mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
@@ -1451,8 +1452,8 @@ static void gen11_dsi_get_timings(struct intel_encoder 
*encoder,
struct drm_display_mode *adjusted_mode =
&pipe_config->hw.adjusted_mode;
 
-   if (pipe_config->dsc.compressed_bpp) {
-   int div = pipe_config->dsc.compressed_bpp;
+   if (pipe_config->dsc.compressed_bpp_x16) {
+   int div = 
intel_fractional_bpp_from_x16(pipe_config->dsc.compressed_bpp_x16);
int mul = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
adjusted_mode->crtc_htotal =
diff --git a/drivers/gpu/drm/i915/display/intel_audio.c 
b/drivers/gpu/drm/i915/display/intel_audio.c
index 19605264a35c..4f1db1581316 100644
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@@ -35,6 +35,7 @@
 #include "intel_crtc.h"
 #include "intel_de.h"
 #include "intel_display_types.h"
+#include "intel_fractional_helper.h"
 #include "intel_lpe_audio.h"
 
 /**
@@ -528,7 +529,7 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
h_active = crtc_state->hw.adjusted_mode.crtc_hdisplay;
h_total = crtc_state->hw.adjusted_mode.crtc_htotal;
pixel_clk = crtc_state->hw.adjusted_mode.crtc_clock;
-   vdsc_bpp = crtc_state->dsc.compressed_bpp;
+   vdsc_bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
cdclk = i915->display.cdclk.hw.cdclk;
/* fec= 0.972261, using rounding multiplier of 100 */
fec_coeff = 972261;
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c 
b/drivers/gpu/drm/i915/display/intel_bios.c
index f735b035436c..3e4a3c62fc8a 100644
--- a/drivers/gpu/drm/i915/displ

[PATCH 6/8] drm/i915/dp: Iterate over output bpp with fractional step size

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

This patch adds support to iterate over compressed output bpp as per the
fractional step, supported by DP sink.

v2:
-Avoid ending up with compressed bpp, same as pipe bpp. (Stan)

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 38 +++--
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 6e09e21909a1..d6c29006b816 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1716,15 +1716,15 @@ static bool intel_dp_dsc_supports_format(struct 
intel_dp *intel_dp,
return drm_dp_dsc_sink_supports_format(intel_dp->dsc_dpcd, 
sink_dsc_format);
 }
 
-static bool is_bw_sufficient_for_dsc_config(u16 compressed_bpp, u32 link_clock,
+static bool is_bw_sufficient_for_dsc_config(u16 compressed_bppx16, u32 
link_clock,
u32 lane_count, u32 mode_clock,
enum intel_output_format 
output_format,
int timeslots)
 {
u32 available_bw, required_bw;
 
-   available_bw = (link_clock * lane_count * timeslots)  / 8;
-   required_bw = compressed_bpp * (intel_dp_mode_to_fec_clock(mode_clock));
+   available_bw = (link_clock * lane_count * timeslots * 16)  / 8;
+   required_bw = compressed_bppx16 * 
(intel_dp_mode_to_fec_clock(mode_clock));
 
return available_bw > required_bw;
 }
@@ -1732,7 +1732,7 @@ static bool is_bw_sufficient_for_dsc_config(u16 
compressed_bpp, u32 link_clock,
 static int dsc_compute_link_config(struct intel_dp *intel_dp,
   struct intel_crtc_state *pipe_config,
   struct link_config_limits *limits,
-  u16 compressed_bpp,
+  u16 compressed_bppx16,
   int timeslots)
 {
const struct drm_display_mode *adjusted_mode = 
&pipe_config->hw.adjusted_mode;
@@ -1747,8 +1747,8 @@ static int dsc_compute_link_config(struct intel_dp 
*intel_dp,
for (lane_count = limits->min_lane_count;
 lane_count <= limits->max_lane_count;
 lane_count <<= 1) {
-   if (!is_bw_sufficient_for_dsc_config(compressed_bpp, 
link_rate, lane_count,
-
adjusted_mode->clock,
+   if (!is_bw_sufficient_for_dsc_config(compressed_bppx16, 
link_rate,
+lane_count, 
adjusted_mode->clock,
 
pipe_config->output_format,
 timeslots))
continue;
@@ -1861,7 +1861,7 @@ icl_dsc_compute_link_config(struct intel_dp *intel_dp,
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
- valid_dsc_bpp[i],
+ valid_dsc_bpp[i] << 4,
  timeslots);
if (ret == 0) {
pipe_config->dsc.compressed_bpp_x16 =
@@ -1888,23 +1888,31 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
  int pipe_bpp,
  int timeslots)
 {
-   u16 compressed_bpp;
+   u8 bppx16_incr = drm_dp_dsc_sink_bpp_incr(intel_dp->dsc_dpcd);
+   struct drm_i915_private *i915 = dp_to_i915(intel_dp);
+   u16 compressed_bppx16;
+   u8 bppx16_step;
int ret;
 
+   if (DISPLAY_VER(i915) < 14 || bppx16_incr <= 1)
+   bppx16_step = 16;
+   else
+   bppx16_step = 16 / bppx16_incr;
+
/* Compressed BPP should be less than the Input DSC bpp */
-   dsc_max_bpp = min(dsc_max_bpp, pipe_bpp - 1);
+   dsc_max_bpp = min(dsc_max_bpp << 4, (pipe_bpp << 4) - bppx16_step);
+   dsc_min_bpp = dsc_min_bpp << 4;
 
-   for (compressed_bpp = dsc_max_bpp;
-compressed_bpp >= dsc_min_bpp;
-compressed_bpp--) {
+   for (compressed_bppx16 = dsc_max_bpp;
+compressed_bppx16 >= dsc_min_bpp;
+compressed_bppx16 -= bppx16_step) {
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
- compressed_bpp,
+ compressed_bppx16,
  timeslots);
if (ret == 0) {
-   pipe_config->dsc.compresse

[PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp prescision

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

Add helper to get the DSC bits_per_pixel precision for the DP sink.

Signed-off-by: Ankit Nautiyal 
---
 drivers/gpu/drm/display/drm_dp_helper.c | 27 +
 include/drm/display/drm_dp_helper.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 8a1b64c57dfd..5c23d5b8fc50 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -2323,6 +2323,33 @@ int drm_dp_read_desc(struct drm_dp_aux *aux, struct 
drm_dp_desc *desc,
 }
 EXPORT_SYMBOL(drm_dp_read_desc);
 
+/**
+ * drm_dp_dsc_sink_bpp_incr() - Get bits per pixel increment
+ * @dsc_dpcd: DSC capabilities from DPCD
+ *
+ * Returns the bpp precision supported by the DP sink.
+ */
+u8 drm_dp_dsc_sink_bpp_incr(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE])
+{
+   u8 bpp_increment_dpcd = dsc_dpcd[DP_DSC_BITS_PER_PIXEL_INC - 
DP_DSC_SUPPORT];
+
+   switch (bpp_increment_dpcd) {
+   case DP_DSC_BITS_PER_PIXEL_1_16:
+   return 16;
+   case DP_DSC_BITS_PER_PIXEL_1_8:
+   return 8;
+   case DP_DSC_BITS_PER_PIXEL_1_4:
+   return 4;
+   case DP_DSC_BITS_PER_PIXEL_1_2:
+   return 2;
+   case DP_DSC_BITS_PER_PIXEL_1_1:
+   return 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_dp_dsc_sink_bpp_incr);
+
 /**
  * drm_dp_dsc_sink_max_slice_count() - Get the max slice count
  * supported by the DSC sink.
diff --git a/include/drm/display/drm_dp_helper.h 
b/include/drm/display/drm_dp_helper.h
index 3369104e2d25..6968d4d87931 100644
--- a/include/drm/display/drm_dp_helper.h
+++ b/include/drm/display/drm_dp_helper.h
@@ -164,6 +164,7 @@ drm_dp_is_branch(const u8 dpcd[DP_RECEIVER_CAP_SIZE])
 }
 
 /* DP/eDP DSC support */
+u8 drm_dp_dsc_sink_bpp_incr(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
 u8 drm_dp_dsc_sink_max_slice_count(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE],
   bool is_edp);
 u8 drm_dp_dsc_sink_line_buf_depth(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
-- 
2.25.1



[PATCH 3/8] drm/i915/display: Consider fractional vdsc bpp while computing m_n values

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

MTL+ supports fractional compressed bits_per_pixel, with precision of
1/16. This compressed bpp is stored in U6.4 format.
Accommodate this precision while computing m_n values.

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_display.c | 6 +-
 drivers/gpu/drm/i915/display/intel_display.h | 2 +-
 drivers/gpu/drm/i915/display/intel_dp.c  | 5 +++--
 drivers/gpu/drm/i915/display/intel_dp_mst.c  | 6 --
 drivers/gpu/drm/i915/display/intel_fdi.c | 2 +-
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index afcbdd4f105a..b37aeac961f4 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -2380,10 +2380,14 @@ void
 intel_link_compute_m_n(u16 bits_per_pixel, int nlanes,
   int pixel_clock, int link_clock,
   struct intel_link_m_n *m_n,
-  bool fec_enable)
+  bool fec_enable,
+  bool is_dsc_fractional_bpp)
 {
u32 data_clock = bits_per_pixel * pixel_clock;
 
+   if (is_dsc_fractional_bpp)
+   data_clock = DIV_ROUND_UP(bits_per_pixel * pixel_clock, 16);
+
if (fec_enable)
data_clock = intel_dp_mode_to_fec_clock(data_clock);
 
diff --git a/drivers/gpu/drm/i915/display/intel_display.h 
b/drivers/gpu/drm/i915/display/intel_display.h
index 49ac8473b988..a4c4ca3cad65 100644
--- a/drivers/gpu/drm/i915/display/intel_display.h
+++ b/drivers/gpu/drm/i915/display/intel_display.h
@@ -398,7 +398,7 @@ u8 intel_calc_active_pipes(struct intel_atomic_state *state,
 void intel_link_compute_m_n(u16 bpp, int nlanes,
int pixel_clock, int link_clock,
struct intel_link_m_n *m_n,
-   bool fec_enable);
+   bool fec_enable, bool is_dsc_fractional_bpp);
 u32 intel_plane_fb_max_stride(struct drm_i915_private *dev_priv,
  u32 pixel_format, u64 modifier);
 enum drm_mode_status
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index cb647bb38b12..6e09e21909a1 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2562,7 +2562,7 @@ intel_dp_drrs_compute_config(struct intel_connector 
*connector,
 
intel_link_compute_m_n(link_bpp, pipe_config->lane_count, pixel_clock,
   pipe_config->port_clock, &pipe_config->dp_m2_n2,
-  pipe_config->fec_enable);
+  pipe_config->fec_enable, false);
 
/* FIXME: abstract this better */
if (pipe_config->splitter.enable)
@@ -2741,7 +2741,8 @@ intel_dp_compute_config(struct intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   pipe_config->port_clock,
   &pipe_config->dp_m_n,
-  pipe_config->fec_enable);
+  pipe_config->fec_enable,
+  pipe_config->dsc.compression_enable);
 
/* FIXME: abstract this better */
if (pipe_config->splitter.enable)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 7bf0b6e4ac0b..8f6bd54532cb 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -172,7 +172,8 @@ static int intel_dp_mst_compute_link_config(struct 
intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   crtc_state->port_clock,
   &crtc_state->dp_m_n,
-  crtc_state->fec_enable);
+  crtc_state->fec_enable,
+  false);
crtc_state->dp_m_n.tu = slots;
 
return 0;
@@ -269,7 +270,8 @@ static int intel_dp_dsc_mst_compute_link_config(struct 
intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   crtc_state->port_clock,
   &crtc_state->dp_m_n,
-  crtc_state->fec_enable);
+  crtc_state->fec_enable,
+  crtc_state->dsc.compression_enable);
crtc_state->dp_m_n.tu = slots;
 
return 0;
diff --git a/drivers/gpu/drm/i915/display/intel_fdi.c 
b/drivers/gpu/drm/i915/display/intel_fdi.c
index e12b46a84fa1..15fddabf7c2e 100644
--- a/drivers/gpu/drm/i915/display/intel_fdi.c
+++ b/drivers/gpu/drm/i915/display/intel_fdi.c
@@ -259,7 +259,7 @@ int ilk_fdi_compute_config(struct intel_crtc *crtc,
pipe_config->fdi_lanes = lane;
 
intel_link_compute_m_n(pipe_config-

[PATCH 0/8] Add DSC fractional bpp support

2023-09-12 Thread Mitul Golani
his patch series adds support for DSC fractional compressed bpp
for MTL+. The series starts with some fixes, followed by patches that
lay groundwork to iterate over valid compressed bpps to select the
'best' compressed bpp with optimal link configuration (taken from
upstream series: https://patchwork.freedesktop.org/series/105200/).

The later patches, add changes to accommodate compressed bpp with
fractional part, including changes to QP calculations.
To get the 'best' compressed bpp, we iterate over the valid compressed
bpp values, but with fractional step size 1/16, 1/8, 1/4 or 1/2 as per
sink support.

The last 2 patches add support to depict DSC sink's fractional support,
and debugfs to enforce use of fractional bpp, while choosing an
appropriate compressed bpp.

Ankit Nautiyal (5):
  drm/display/dp: Add helper function to get DSC bpp prescision
  drm/i915/display: Store compressed bpp in U6.4 format
  drm/i915/display: Consider fractional vdsc bpp while computing m_n
values
  drm/i915/audio : Consider fractional vdsc bpp while computing tu_data
  drm/i915/dp: Iterate over output bpp with fractional step size

Swati Sharma (2):
  drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp
  drm/i915/dsc: Allow DSC only with fractional bpp when forced from
debugfs

Vandita Kulkarni (1):
  drm/i915/dsc/mtl: Add support for fractional bpp

 drivers/gpu/drm/display/drm_dp_helper.c   | 27 ++
 drivers/gpu/drm/i915/display/icl_dsi.c| 11 +--
 drivers/gpu/drm/i915/display/intel_audio.c| 17 ++--
 drivers/gpu/drm/i915/display/intel_bios.c |  6 +-
 drivers/gpu/drm/i915/display/intel_cdclk.c|  6 +-
 drivers/gpu/drm/i915/display/intel_display.c  |  8 +-
 drivers/gpu/drm/i915/display/intel_display.h  |  2 +-
 .../drm/i915/display/intel_display_debugfs.c  | 83 +++
 .../drm/i915/display/intel_display_types.h|  4 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 81 +++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   | 32 ---
 drivers/gpu/drm/i915/display/intel_fdi.c  |  2 +-
 .../i915/display/intel_fractional_helper.h| 36 
 .../gpu/drm/i915/display/intel_qp_tables.c|  3 -
 drivers/gpu/drm/i915/display/intel_vdsc.c | 30 +--
 include/drm/display/drm_dp_helper.h   |  1 +
 16 files changed, 275 insertions(+), 74 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/display/intel_fractional_helper.h

-- 
2.25.1



[PATCH v4] drm/ssd130x: Store the HW buffer in the driver-private CRTC state

2023-09-12 Thread Javier Martinez Canillas
The commit 45b58669e532 ("drm/ssd130x: Allocate buffer in the plane's
.atomic_check() callback") moved the allocation of the intermediate and
HW buffers from the encoder's .atomic_enable callback, to the plane's
.atomic_check callback.

This was suggested by Maxime Ripard, because drivers aren't allowed to
fail after the drm_atomic_helper_swap_state() function has been called.

And the encoder's .atomic_enable happens after the new atomic state has
been swapped, so allocations (that can fail) shouldn't be done there.

But the HW buffer isn't really tied to the plane's state. It has a fixed
size that only depends on the (also fixed) display resolution defined in
the Device Tree Blob.

That buffer can be considered part of the CRTC state, and for this reason
makes more sense to do its allocation in the CRTC .atomic_check callback.

The other allocated buffer (used to store a conversion from the emulated
XR24 format to the native R1 format) is part of the plane's state, since
it will be optional once the driver supports R1 and allows user-space to
set that pixel format.

So let's keep the allocation for it in the plane's .atomic_check callback,
this can't be moved to the CRTC's .atomic_check because changing a format
does not trigger a CRTC mode set.

Reported-by: Geert Uytterhoeven 
Closes: 
https://lore.kernel.org/dri-devel/CAMuHMdWv_QSatDgihr8=2SXHhvp=icnxumzczopwt9q_qio...@mail.gmail.com/
Signed-off-by: Javier Martinez Canillas 
---

Changes in v4:
- Fix a build warning reported by the robot (missing static in helper function).

Changes in v3:
- Call drm_atomic_get_crtc_state() in the plane's .atomic_check (Maxime Ripard).

Changes in v2:
- Drop RFC prefix.
- Fix typo in commit message (Thomas Zimmermann).
- Store the HW buffer in the driver's private CRTC state (Thomas Zimmermann).
- Just use kmalloc() kcalloc() when allocating buffers (Thomas Zimmermann).
- Keep the allocation of the intermediate buffer in the plane's .atomic_check

 drivers/gpu/drm/solomon/ssd130x.c | 153 +++---
 1 file changed, 118 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/solomon/ssd130x.c 
b/drivers/gpu/drm/solomon/ssd130x.c
index 3b4dde09538a..8ab02724f65f 100644
--- a/drivers/gpu/drm/solomon/ssd130x.c
+++ b/drivers/gpu/drm/solomon/ssd130x.c
@@ -141,14 +141,23 @@ const struct ssd130x_deviceinfo ssd130x_variants[] = {
 };
 EXPORT_SYMBOL_NS_GPL(ssd130x_variants, DRM_SSD130X);
 
+struct ssd130x_crtc_state {
+   struct drm_crtc_state base;
+   /* Buffer to store pixels in HW format and written to the panel */
+   u8 *data_array;
+};
+
 struct ssd130x_plane_state {
struct drm_shadow_plane_state base;
/* Intermediate buffer to convert pixels from XRGB to HW format */
u8 *buffer;
-   /* Buffer to store pixels in HW format and written to the panel */
-   u8 *data_array;
 };
 
+static inline struct ssd130x_crtc_state *to_ssd130x_crtc_state(struct 
drm_crtc_state *state)
+{
+   return container_of(state, struct ssd130x_crtc_state, base);
+}
+
 static inline struct ssd130x_plane_state *to_ssd130x_plane_state(struct 
drm_plane_state *state)
 {
return container_of(state, struct ssd130x_plane_state, base.base);
@@ -448,13 +457,11 @@ static int ssd130x_init(struct ssd130x_device *ssd130x)
 }
 
 static int ssd130x_update_rect(struct ssd130x_device *ssd130x,
-  struct ssd130x_plane_state *ssd130x_state,
-  struct drm_rect *rect)
+  struct drm_rect *rect, u8 *buf,
+  u8 *data_array)
 {
unsigned int x = rect->x1;
unsigned int y = rect->y1;
-   u8 *buf = ssd130x_state->buffer;
-   u8 *data_array = ssd130x_state->data_array;
unsigned int width = drm_rect_width(rect);
unsigned int height = drm_rect_height(rect);
unsigned int line_length = DIV_ROUND_UP(width, 8);
@@ -550,12 +557,10 @@ static int ssd130x_update_rect(struct ssd130x_device 
*ssd130x,
return ret;
 }
 
-static void ssd130x_clear_screen(struct ssd130x_device *ssd130x,
-struct ssd130x_plane_state *ssd130x_state)
+static void ssd130x_clear_screen(struct ssd130x_device *ssd130x, u8 
*data_array)
 {
unsigned int page_height = ssd130x->device_info->page_height;
unsigned int pages = DIV_ROUND_UP(ssd130x->height, page_height);
-   u8 *data_array = ssd130x_state->data_array;
unsigned int width = ssd130x->width;
int ret, i;
 
@@ -594,15 +599,13 @@ static void ssd130x_clear_screen(struct ssd130x_device 
*ssd130x,
}
 }
 
-static int ssd130x_fb_blit_rect(struct drm_plane_state *state,
+static int ssd130x_fb_blit_rect(struct drm_framebuffer *fb,
const struct iosys_map *vmap,
-   struct drm_rect *rect)
+   struct drm_rect *rect,
+   u8 *buf, u8 *data_ar

RE: [PATCH 2/8] drm/i915/display: Store compressed bpp in U6.4 format

2023-09-12 Thread Kandpal, Suraj
> Subject: [PATCH 2/8] drm/i915/display: Store compressed bpp in U6.4 format
> 
> From: Ankit Nautiyal 
> 
> DSC parameter bits_per_pixel is stored in U6.4 format.
> The 4 bits represent the fractional part of the bpp.
> Currently we use compressed_bpp member of dsc structure to store only the
> integral part of the bits_per_pixel.
> To store the full bits_per_pixel along with the fractional part, 
> compressed_bpp
> is changed to store bpp in U6.4 formats. Intergral part is retrieved by simply
> right shifting the member compressed_bpp by 4.
> 
> v2:
> -Use to_bpp_int, to_bpp_frac_dec, to_bpp_x16 helpers while dealing  with
> compressed bpp. (Suraj) -Fix comment styling. (Suraj)
> 
> v3:
> -Add separate file for 6.4 fixed point helper(Jani, Nikula) -Add comment for
> magic values(Suraj)
> 

A lot of checkpatch issues have been created due to the renaming you can fix 
that.
Other than that everything else looks good

Regards,
Suraj Kandpal

> Signed-off-by: Ankit Nautiyal 
> Signed-off-by: Mitul Golani 
> Reviewed-by: Suraj Kandpal 
> ---
>  drivers/gpu/drm/i915/display/icl_dsi.c| 11 +++---
>  drivers/gpu/drm/i915/display/intel_audio.c|  3 +-
>  drivers/gpu/drm/i915/display/intel_bios.c |  6 ++--
>  drivers/gpu/drm/i915/display/intel_cdclk.c|  5 +--
>  drivers/gpu/drm/i915/display/intel_display.c  |  2 +-
>  .../drm/i915/display/intel_display_types.h|  3 +-
>  drivers/gpu/drm/i915/display/intel_dp.c   | 29 ---
>  drivers/gpu/drm/i915/display/intel_dp_mst.c   | 22 ++--
>  .../i915/display/intel_fractional_helper.h| 36 +++
>  drivers/gpu/drm/i915/display/intel_vdsc.c |  5 +--
>  10 files changed, 85 insertions(+), 37 deletions(-)  create mode 100644
> drivers/gpu/drm/i915/display/intel_fractional_helper.h
> 
> diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c
> b/drivers/gpu/drm/i915/display/icl_dsi.c
> index ad6488e9c2b2..0f7594b6aa1f 100644
> --- a/drivers/gpu/drm/i915/display/icl_dsi.c
> +++ b/drivers/gpu/drm/i915/display/icl_dsi.c
> @@ -43,6 +43,7 @@
>  #include "intel_de.h"
>  #include "intel_dsi.h"
>  #include "intel_dsi_vbt.h"
> +#include "intel_fractional_helper.h"
>  #include "intel_panel.h"
>  #include "intel_vdsc.h"
>  #include "intel_vdsc_regs.h"
> @@ -330,7 +331,7 @@ static int afe_clk(struct intel_encoder *encoder,
>   int bpp;
> 
>   if (crtc_state->dsc.compression_enable)
> - bpp = crtc_state->dsc.compressed_bpp;
> + bpp =
> +intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
>   else
>   bpp = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
> 
> @@ -860,7 +861,7 @@ gen11_dsi_set_transcoder_timings(struct
> intel_encoder *encoder,
>* compressed and non-compressed bpp.
>*/
>   if (crtc_state->dsc.compression_enable) {
> - mul = crtc_state->dsc.compressed_bpp;
> + mul =
> +intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
>   div = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
>   }
> 
> @@ -884,7 +885,7 @@ gen11_dsi_set_transcoder_timings(struct
> intel_encoder *encoder,
>   int bpp, line_time_us, byte_clk_period_ns;
> 
>   if (crtc_state->dsc.compression_enable)
> - bpp = crtc_state->dsc.compressed_bpp;
> + bpp =
> +intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
>   else
>   bpp = mipi_dsi_pixel_format_to_bpp(intel_dsi-
> >pixel_format);
> 
> @@ -1451,8 +1452,8 @@ static void gen11_dsi_get_timings(struct
> intel_encoder *encoder,
>   struct drm_display_mode *adjusted_mode =
>   &pipe_config->hw.adjusted_mode;
> 
> - if (pipe_config->dsc.compressed_bpp) {
> - int div = pipe_config->dsc.compressed_bpp;
> + if (pipe_config->dsc.compressed_bpp_x16) {
> + int div =
> +intel_fractional_bpp_from_x16(pipe_config->dsc.compressed_bpp_x16);
>   int mul = mipi_dsi_pixel_format_to_bpp(intel_dsi-
> >pixel_format);
> 
>   adjusted_mode->crtc_htotal =
> diff --git a/drivers/gpu/drm/i915/display/intel_audio.c
> b/drivers/gpu/drm/i915/display/intel_audio.c
> index 19605264a35c..4f1db1581316 100644
> --- a/drivers/gpu/drm/i915/display/intel_audio.c
> +++ b/drivers/gpu/drm/i915/display/intel_audio.c
> @@ -35,6 +35,7 @@
>  #include "intel_crtc.h"
>  #include "intel_de.h"
>  #include "intel_display_types.h"
> +#include "intel_fractional_helper.h"
>  #include "intel_lpe_audio.h"
> 
>  /**
> @@ -528,7 +529,7 @@ static unsigned int calc_hblank_early_prog(struct
> intel_encoder *encoder,
>   h_active = crtc_state->hw.adjusted_mode.crtc_hdisplay;
>   h_total = crtc_state->hw.adjusted_mode.crtc_htotal;
>   pixel_clk = crtc_state->hw.adjusted_mode.crtc_clock;
> - vdsc_bpp = crtc_state->dsc.compressed_bpp;
> + vdsc_bpp =
> +intel_fractional_bpp_from_x16(crtc_state->ds

RE: [PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp prescision

2023-09-12 Thread Kandpal, Suraj
> Subject: [PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp
> prescision
> 
> From: Ankit Nautiyal 
> 
> Add helper to get the DSC bits_per_pixel precision for the DP sink.
> 

LGTM.

Reviewed-by: Suraj Kandpal 

> Signed-off-by: Ankit Nautiyal 
> ---
>  drivers/gpu/drm/display/drm_dp_helper.c | 27 +
>  include/drm/display/drm_dp_helper.h |  1 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/display/drm_dp_helper.c
> b/drivers/gpu/drm/display/drm_dp_helper.c
> index 8a1b64c57dfd..5c23d5b8fc50 100644
> --- a/drivers/gpu/drm/display/drm_dp_helper.c
> +++ b/drivers/gpu/drm/display/drm_dp_helper.c
> @@ -2323,6 +2323,33 @@ int drm_dp_read_desc(struct drm_dp_aux *aux,
> struct drm_dp_desc *desc,  }  EXPORT_SYMBOL(drm_dp_read_desc);
> 
> +/**
> + * drm_dp_dsc_sink_bpp_incr() - Get bits per pixel increment
> + * @dsc_dpcd: DSC capabilities from DPCD
> + *
> + * Returns the bpp precision supported by the DP sink.
> + */
> +u8 drm_dp_dsc_sink_bpp_incr(const u8
> +dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE])
> +{
> + u8 bpp_increment_dpcd = dsc_dpcd[DP_DSC_BITS_PER_PIXEL_INC -
> +DP_DSC_SUPPORT];
> +
> + switch (bpp_increment_dpcd) {
> + case DP_DSC_BITS_PER_PIXEL_1_16:
> + return 16;
> + case DP_DSC_BITS_PER_PIXEL_1_8:
> + return 8;
> + case DP_DSC_BITS_PER_PIXEL_1_4:
> + return 4;
> + case DP_DSC_BITS_PER_PIXEL_1_2:
> + return 2;
> + case DP_DSC_BITS_PER_PIXEL_1_1:
> + return 1;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_dp_dsc_sink_bpp_incr);
> +
>  /**
>   * drm_dp_dsc_sink_max_slice_count() - Get the max slice count
>   * supported by the DSC sink.
> diff --git a/include/drm/display/drm_dp_helper.h
> b/include/drm/display/drm_dp_helper.h
> index 3369104e2d25..6968d4d87931 100644
> --- a/include/drm/display/drm_dp_helper.h
> +++ b/include/drm/display/drm_dp_helper.h
> @@ -164,6 +164,7 @@ drm_dp_is_branch(const u8
> dpcd[DP_RECEIVER_CAP_SIZE])  }
> 
>  /* DP/eDP DSC support */
> +u8 drm_dp_dsc_sink_bpp_incr(const u8
> +dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
>  u8 drm_dp_dsc_sink_max_slice_count(const u8
> dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE],
>  bool is_edp);
>  u8 drm_dp_dsc_sink_line_buf_depth(const u8
> dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
> --
> 2.25.1



Re: linux-next: Tree for Sep 11 (drivers/gpu/drm/i915/display/intel_backlight.o)

2023-09-12 Thread Randy Dunlap
Hi Jani,

On 9/12/23 07:52, Randy Dunlap wrote:
> 
> 
> On 9/12/23 00:47, Jani Nikula wrote:
>> On Mon, 11 Sep 2023, Randy Dunlap  wrote:
>>> On 9/10/23 19:11, Stephen Rothwell wrote:
 Hi all,

 Please do *not* include material destined for v6.7 in your linux-next
 included branches until *after* v6.6-rc1 has been released.  Also,
 do *not* rebase your linu-next included branches onto v6.5.

 Changes since 20230908:

 Non-merge commits (relative to Linus' tree): 643
  614 files changed, 227990 insertions(+), 9502 deletions(-)

 
>>>
>>> on x86_64:
>>>
>>> # CONFIG_ACPI is not set
>>> CONFIG_DRM_I915=y
>>> CONFIG_BACKLIGHT_CLASS_DEVICE=m
>>>
>>> I915 selects BACKLIGHT_CLASS_DEVICE if ACPI is set.
>>>
>>> ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
>>> `intel_backlight_device_register':
>>> intel_backlight.c:(.text+0x4988): undefined reference to 
>>> `backlight_device_get_by_name'
>>> ld: intel_backlight.c:(.text+0x4a1b): undefined reference to 
>>> `backlight_device_register'
>>> ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
>>> `intel_backlight_device_unregister':
>>> intel_backlight.c:(.text+0x4b56): undefined reference to 
>>> `backlight_device_unregister'
>>
>> This comes up periodically. The fix is for i915 to depend on backlight,
>> but it's not possible to fix just i915, as it'll lead to circular deps
>> unless *all* select backlight is switched to depend on backlight.
>>
>> I've gone through it once [1], and not keen on doing it again unless
>> there's buy-in.
>>
>> IS_REACHABLE() is often suggested as a workaround, but I think it's just
>> plain wrong. i915=y backlight=m is not a configuration that makes
>> sense. Kernel configuration is hard enough, there's no point in allowing
>> dumb configs that just silently don't work.
>>
> 
> Yes, IS_REACHABLE() is just fugly nonsense.
> 
> Thanks for the reminder of your attempt(s).
> 
>>
>> BR,
>> Jani.
>>
>>
>> [1] 
>> https://lore.kernel.org/r/1413580403-16225-1-git-send-email-jani.nik...@intel.com

I did a partial patch series (eliminated the I915 problems with 9 patches,
without build testing -- only kconfig testing -- so more changes may be
needed), then I looked at your patch [1] above.

I like it but even if Tomi and Daniel didn't have problems with it,
I am concerned that it would cause problems with existing working .config files.

Still, something should be done about the mixed usage of select and depends on
for BACKLIGHT_CLASS_DEVICE (et al).

thanks.
-- 
~Randy


Re: [Freedreno] [RFC PATCH v1 01/12] Revert "drm/sysfs: Link DRM connectors to corresponding Type-C connectors"

2023-09-12 Thread Rob Clark
On Mon, Sep 11, 2023 at 2:15 PM Dmitry Baryshkov
 wrote:
>
> On 06/09/2023 16:38, Heikki Krogerus wrote:
> > On Wed, Sep 06, 2023 at 03:48:35PM +0300, Dmitry Baryshkov wrote:
> >> On Wed, 6 Sept 2023 at 15:44, Heikki Krogerus
> >>  wrote:
> >>>
> >>> On Tue, Sep 05, 2023 at 01:56:59PM +0300, Dmitry Baryshkov wrote:
>  Hi Heikki,
> 
>  On Tue, 5 Sept 2023 at 11:50, Heikki Krogerus
>   wrote:
> >
> > Hi Dmitry,
> >
> > On Mon, Sep 04, 2023 at 12:41:39AM +0300, Dmitry Baryshkov wrote:
> >> The kdev->fwnode pointer is never set in drm_sysfs_connector_add(), so
> >> dev_fwnode() checks never succeed, making the respective commit NOP.
> >
> > That's not true. The dev->fwnode is assigned when the device is
> > created on ACPI platforms automatically. If the drm_connector fwnode
> > member is assigned before the device is registered, then that fwnode
> > is assigned also to the device - see 
> > drm_connector_acpi_find_companion().
> >
> > But please note that even if drm_connector does not have anything in
> > its fwnode member, the device may still be assigned fwnode, just based
> > on some other logic (maybe in drivers/acpi/acpi_video.c?).
> >
> >> And if drm_sysfs_connector_add() is modified to set kdev->fwnode, it
> >> breaks drivers already using components (as it was pointed at [1]),
> >> resulting in a deadlock. Lockdep trace is provided below.
> >>
> >> Granted these two issues, it seems impractical to fix this commit in 
> >> any
> >> sane way. Revert it instead.
> >
> > I think there is already user space stuff that relies on these links,
> > so I'm not sure you can just remove them like that. If the component
> > framework is not the correct tool here, then I think you need to
> > suggest some other way of creating them.
> 
>  The issue (that was pointed out during review) is that having a
>  component code in the framework code can lead to lockups. With the
>  patch #2 in place (which is the only logical way to set kdev->fwnode
>  for non-ACPI systems) probing of drivers which use components and set
>  drm_connector::fwnode breaks immediately.
> 
>  Can we move the component part to the respective drivers? With the
>  patch 2 in place, connector->fwnode will be copied to the created
>  kdev's fwnode pointer.
> 
>  Another option might be to make this drm_sysfs component registration 
>  optional.
> >>>
> >>> You don't need to use the component framework at all if there is
> >>> a better way of determining the connection between the DP and its
> >>> Type-C connector (I'm assuming that that's what this series is about).
> >>> You just need the symlinks, not the component.
> >>
> >> The problem is that right now this component registration has become
> >> mandatory. And if I set the kdev->fwnode manually (like in the patch
> >> 2), the kernel hangs inside the component code.
> >> That's why I proposed to move the components to the place where they
> >> are really necessary, e.g. i915 and amd drivers.
> >
> > So why can't we replace the component with the method you are
> > proposing in this series of finding out the Type-C port also with
> > i915, AMD, or whatever driver and platform (that's the only thing that
> > component is used for)?
>
> The drm/msm driver uses drm_bridge for the pipeline (including the last
> DP entry) and the drm_bridge_connector to create the connector. I think
> that enabling i915 and AMD drivers to use drm_bridge fells out of scope
> for this series.
>
>
> > Determining the connection between a DP and its Type-C connector is
> > starting to get really important, so ideally we have a common solution
> > for that.
>
> Yes. This is what we have been discussing with Simon for quite some time
> on #dri-devel.
>
> Unfortunately I think the solution that got merged was pretty much
> hastened in instead of being well-thought. For example, it is also not
> always possible to provide the drm_connector / typec_connector links (as
> you can see from the patch7. Sometimes we can only express that this is
> a Type-C DP connector, but we can not easily point it to the particular
> USB-C port.
>
> So, I'm not sure, how can we proceed here. Currently merged patch breaks
> drm/msm if we even try to use it by setting kdef->fwnode to
> drm_connector->fwnode. The pointed out `drivers/usb/typec/port-mapper.c`
> is an ACPI-only thing, which is not expected to work in a non-ACPI cases.

In these cases we revert and try again next cycle

BR,
-R

>
> --
> With best wishes
> Dmitry
>


Re: [PATCH v4 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-12 Thread Rob Clark
On Tue, Sep 12, 2023 at 6:46 PM Rob Clark  wrote:
>
> On Tue, Sep 12, 2023 at 2:32 AM Boris Brezillon
>  wrote:
> >
> > On Tue, 12 Sep 2023 09:37:00 +0100
> > Adrián Larumbe  wrote:
> >
> > > The current implementation will try to pick the highest available size
> > > display unit as soon as the BO size exceeds that of the previous
> > > multiplier. That can lead to loss of precision in BO's whose size is
> > > not a multiple of a MiB.
> > >
> > > Fix it by changing the unit selection criteria.
> > >
> > > For much bigger BO's, their size will naturally be aligned on something
> > > bigger than a 4 KiB page, so in practice it is very unlikely their display
> > > unit would default to KiB.
> >
> > Let's wait for Rob's opinion on this.
>
> This would mean that if you have SZ_1G + SZ_1K worth of buffers, you'd
> report the result in KiB.. which seems like overkill to me, esp given
> that the result is just a snapshot in time of a figure that
> realistically is dynamic.
>
> Maybe if you have SZ_1G+SZ_1K worth of buffers you should report the
> result with more precision than GiB, but more than MiB seems a bit
> overkill.
>
> BR,
> -R
>
> > >
> > > Signed-off-by: Adrián Larumbe 
> > > ---
> > >  drivers/gpu/drm/drm_file.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > index 762965e3d503..bf7d2fe46bfa 100644
> > > --- a/drivers/gpu/drm/drm_file.c
> > > +++ b/drivers/gpu/drm/drm_file.c
> > > @@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const 
> > > char *stat,
> > >   unsigned u;
> > >
> > >   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > > - if (sz < SZ_1K)

btw, I was thinking more along the lines of:

   if (sz < 10*SZ_1K)

(or perhaps maybe 100*SZ_1K)

I mean, any visualization tool is going to scale the y axis based on
the order of magnitude.. and if I'm looking at the fdinfo with my
eyeballs I don't want to count the # of digits manually to do the
conversion in my head.  The difference btwn 4 or 5 or maybe 6 digits
is easy enough to eyeball, but more than that is too much for my
eyesight, and I'm not seeing how it is useful ;-)

But if someone really has a valid use case for having precision in 1KB
then I'm willing to be overruled.  But I'm not a fan of the earlier
approach of different drivers reporting results differently, the whole
point of fdinfo was to have some standardized reporting.

BR,
-R

> > > + if (sz & (SZ_1K - 1))
> > >   break;
> > >   sz = div_u64(sz, SZ_1K);
> > >   }
> >


Re: [PATCH v4 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-12 Thread Rob Clark
On Tue, Sep 12, 2023 at 2:32 AM Boris Brezillon
 wrote:
>
> On Tue, 12 Sep 2023 09:37:00 +0100
> Adrián Larumbe  wrote:
>
> > The current implementation will try to pick the highest available size
> > display unit as soon as the BO size exceeds that of the previous
> > multiplier. That can lead to loss of precision in BO's whose size is
> > not a multiple of a MiB.
> >
> > Fix it by changing the unit selection criteria.
> >
> > For much bigger BO's, their size will naturally be aligned on something
> > bigger than a 4 KiB page, so in practice it is very unlikely their display
> > unit would default to KiB.
>
> Let's wait for Rob's opinion on this.

This would mean that if you have SZ_1G + SZ_1K worth of buffers, you'd
report the result in KiB.. which seems like overkill to me, esp given
that the result is just a snapshot in time of a figure that
realistically is dynamic.

Maybe if you have SZ_1G+SZ_1K worth of buffers you should report the
result with more precision than GiB, but more than MiB seems a bit
overkill.

BR,
-R

> >
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  drivers/gpu/drm/drm_file.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index 762965e3d503..bf7d2fe46bfa 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const 
> > char *stat,
> >   unsigned u;
> >
> >   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> > - if (sz < SZ_1K)
> > + if (sz & (SZ_1K - 1))
> >   break;
> >   sz = div_u64(sz, SZ_1K);
> >   }
>


Re: [PATCH] staging: fbtft: Removed unnecessary parenthesis around conditions to comply with the checkpatch coding style.

2023-09-12 Thread Bagas Sanjaya
On Wed, Sep 13, 2023 at 11:02:13AM +1000, Angus Gardner wrote:
> ---
>  drivers/staging/fbtft/fb_ra8875.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)

No patch description and SoB, so Greg can't take this as-is.

> - if ((par->info->var.xres == 320) && (par->info->var.yres == 240)) {
> + if (par->info->var.xres == 320 && par->info->var.yres == 240) {

Greg prefers explicit parentheses on complex expressions (see [1] and [2]
for examples), hence NAK.

Thanks.

[1]: https://lore.kernel.org/linux-staging/zcwgozqdh1kwt...@kroah.com/
[2]: https://lore.kernel.org/linux-staging/y%2fiaytkk4vsok...@kroah.com/

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature


linux-next: manual merge of the drm-misc tree with Linus' tree

2023-09-12 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm-misc tree got a conflict in:

  drivers/gpu/drm/mediatek/mtk_dpi.c

between commits:

  47d4bb6bbcdb ("drm/mediatek: mtk_dpi: Simplify with devm_drm_bridge_add()")
  90c95c3892dd ("drm/mediatek: mtk_dpi: Switch to .remove_new() void callback")

from Linus' tree and commit:

  c04ca6bbb7ea ("drm/mediatek: Convert to platform remove callback returning 
void")

from the drm-misc tree.

I fixed it up (the latter is the same as 90c95c3892dd) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgp77TDEFXMpF.pgp
Description: OpenPGP digital signature


Re: [PATCH v16 15/20] drm/shmem-helper: Add memory shrinker

2023-09-12 Thread Dmitry Osipenko
On 9/5/23 11:03, Boris Brezillon wrote:
>>* But
>> + * acquiring the obj lock in 
>> drm_gem_shmem_release_pages_locked() can
>> + * cause a locking order inversion between 
>> reservation_ww_class_mutex
>> + * and fs_reclaim.
>> + *
>> + * This deadlock is not actually possible, because no one should
>> + * be already holding the lock when drm_gem_shmem_free() is 
>> called.
>> + * Unfortunately lockdep is not aware of this detail.  So when 
>> the
>> + * refcount drops to zero, don't touch the reservation lock.
>> + */
>> +if (shmem->got_pages_sgt &&
>> +refcount_dec_and_test(&shmem->pages_use_count)) {
>> +drm_gem_shmem_do_release_pages_locked(shmem);
>> +shmem->got_pages_sgt = false;
>>  }
> Leaking memory is the right thing to do if pages_use_count > 1 (it's
> better to leak than having someone access memory it no longer owns), but
> I think it's worth mentioning in the above comment.

It's unlikely that it will be only a leak without a following up
use-after-free. Neither is acceptable.

The drm_gem_shmem_free() could be changed such that kernel won't blow up
on a refcnt bug, but that's not worthwhile doing because drivers
shouldn't have silly bugs.

-- 
Best regards,
Dmitry



Re: [PATCH v16 09/20] drm/shmem-helper: Remove obsoleted is_iomem test

2023-09-12 Thread Dmitry Osipenko
On 9/5/23 09:46, Boris Brezillon wrote:
> On Sun,  3 Sep 2023 20:07:25 +0300
> Dmitry Osipenko  wrote:
> 
>> Everything that uses the mapped buffer should be agnostic to is_iomem.
>> The only reason for the is_iomem test is that we're setting shmem->vaddr
>> to the returned map->vaddr. Now that the shmem->vaddr code is gone, remove
>> the obsoleted is_iomem test to clean up the code.
>>
>> Suggested-by: Thomas Zimmermann 
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  drivers/gpu/drm/drm_gem_shmem_helper.c | 6 --
>>  1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
>> b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> index 2b50d1a7f718..25e99468ced2 100644
>> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
>> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
>> @@ -317,12 +317,6 @@ int drm_gem_shmem_vmap_locked(struct 
>> drm_gem_shmem_object *shmem,
>>  
>>  if (obj->import_attach) {
>>  ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
>> -if (!ret) {
>> -if (drm_WARN_ON(obj->dev, map->is_iomem)) {
>> -dma_buf_vunmap(obj->import_attach->dmabuf, map);
>> -return -EIO;
>> -}
>> -}
> 
> Given there's nothing to unroll for the dmabuf case, I think it'd be
> good to return directly and skip all the error paths. It would also
> allow you to get rid of one indentation level for the !dmabuf path.
> 
>   if (obj->import_attach)
>   return dma_buf_vmap(obj->import_attach->dmabuf, map);
> 
>   // non-dmabuf vmap logic here...

There is a common error message there that uses the common ret. The
error unwinding could be improved, but then it should be a separate
patch as it's unrelated to the change made here.

-- 
Best regards,
Dmitry



Re: [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

2023-09-12 Thread Danilo Krummrich
On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote:
> 
> On 9/12/23 18:50, Danilo Krummrich wrote:
> > On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> > > Hi, Danilo,
> > > 
> > > On 9/9/23 17:31, Danilo Krummrich wrote:
> > > > So far the DRM GPUVA manager offers common infrastructure to track GPU 
> > > > VA
> > > > allocations and mappings, generically connect GPU VA mappings to their
> > > > backing buffers and perform more complex mapping operations on the GPU 
> > > > VA
> > > > space.
> > > > 
> > > > However, there are more design patterns commonly used by drivers, which
> > > > can potentially be generalized in order to make the DRM GPUVA manager
> > > > represent a basic GPU-VM implementation. In this context, this patch 
> > > > aims
> > > > at generalizing the following elements.
> > > > 
> > > > 1) Provide a common dma-resv for GEM objects not being used outside of
> > > >  this GPU-VM.
> > > > 
> > > > 2) Provide tracking of external GEM objects (GEM objects which are
> > > >  shared with other GPU-VMs).
> > > > 
> > > > 3) Provide functions to efficiently lock all GEM objects dma-resv the
> > > >  GPU-VM contains mappings of.
> > > > 
> > > > 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> > > >  of, such that validation of evicted GEM objects is accelerated.
> > > > 
> > > > 5) Provide some convinience functions for common patterns.
> > > > 
> > > > Rather than being designed as a "framework", the target is to make all
> > > > features appear as a collection of optional helper functions, such that
> > > > drivers are free to make use of the DRM GPUVA managers basic
> > > > functionality and opt-in for other features without setting any feature
> > > > flags, just by making use of the corresponding functions.
> > > > 
> > > > Big kudos to Boris Brezillon for his help to figure out locking for 
> > > > drivers
> > > > updating the GPU VA space within the fence signalling path.
> > > > 
> > > > Suggested-by: Matthew Brost 
> > > > Signed-off-by: Danilo Krummrich 
> > > > ---
> > > >drivers/gpu/drm/drm_gpuvm.c | 516 
> > > > 
> > > >include/drm/drm_gpuvm.h | 197 ++
> > > >2 files changed, 713 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > > > index f4411047dbb3..8e62a043f719 100644
> > > > --- a/drivers/gpu/drm/drm_gpuvm.c
> > > > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > > > @@ -73,6 +73,21 @@
> > > > * &drm_gem_object list of &drm_gpuvm_bos for an existing instance 
> > > > of this
> > > > * particular combination. If not existent a new instance is created 
> > > > and linked
> > > > * to the &drm_gem_object.
> > > > + *
> > > > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are 
> > > > also used
> > > > + * as entry for the &drm_gpuvm's lists of external and evicted 
> > > > objects. Those
> > > > + * list are maintained in order to accelerate locking of dma-resv 
> > > > locks and
> > > > + * validation of evicted objects bound in a &drm_gpuvm. For instance 
> > > > the all
> > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by 
> > > > calling
> > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call 
> > > > drm_gpuvm_validate() in
> > > > + * order to validate all evicted &drm_gem_objects. It is also possible 
> > > > to lock
> > > > + * additional &drm_gem_objects by providing the corresponding 
> > > > parameters to
> > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while 
> > > > making
> > > > + * use of helper functions such as drm_gpuvm_prepare_range() or
> > > > + * drm_gpuvm_prepare_objects().
> > > > + *
> > > > + * Every bound &drm_gem_object is treated as external object when its 
> > > > &dma_resv
> > > > + * structure is different than the &drm_gpuvm's common &dma_resv 
> > > > structure.
> > > > */
> > > >/**
> > > > @@ -420,6 +435,20 @@
> > > > * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm 
> > > > and
> > > > * &drm_gem_object must be able to observe previous creations and 
> > > > destructions
> > > > * of &drm_gpuvm_bos in order to keep instances unique.
> > > > + *
> > > > + * The &drm_gpuvm's lists for keeping track of external and evicted 
> > > > objects are
> > > > + * protected against concurrent insertion / removal and iteration 
> > > > internally.
> > > > + *
> > > > + * However, drivers still need ensure to protect concurrent calls to 
> > > > functions
> > > > + * iterating those lists, such as drm_gpuvm_validate() and
> > > > + * drm_gpuvm_prepare_objects(). Every such function contains a 
> > > > particular
> > > > + * comment and lockdep checks if possible.
> > > > + *
> > > > + * Functions adding or removing entries from those lists, such as
> > > > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called 
> > > > with external

Re: [PATCH v3] drm/ssd130x: Store the HW buffer in the driver-private CRTC state

2023-09-12 Thread kernel test robot
Hi Javier,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[cannot apply to linus/master v6.6-rc1 next-20230912]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Javier-Martinez-Canillas/drm-ssd130x-Store-the-HW-buffer-in-the-driver-private-CRTC-state/20230912-191205
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:
https://lore.kernel.org/r/20230912110946.944791-1-javierm%40redhat.com
patch subject: [PATCH v3] drm/ssd130x: Store the HW buffer in the 
driver-private CRTC state
config: i386-randconfig-063-20230913 
(https://download.01.org/0day-ci/archive/20230913/202309130552.3iv4beij-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20230913/202309130552.3iv4beij-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202309130552.3iv4beij-...@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/gpu/drm/solomon/ssd130x.c:810:5: sparse: sparse: symbol 
>> 'ssd130x_crtc_helper_atomic_check' was not declared. Should it be static?

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v2] drm/simpledrm: Add support for multiple "power-domains"

2023-09-12 Thread Eric Curtin
On Tue, 12 Sept 2023 at 21:30, Janne Grunau via B4 Relay
 wrote:
>
> From: Janne Grunau 
>
> Multiple power domains need to be handled explicitly in each driver. The
> driver core can not handle it automatically since it is not aware of
> power sequencing requirements the hardware might have. This is not a
> problem for simpledrm since everything is expected to be powered on by
> the bootloader. simpledrm has just ensure it remains powered on during
> its lifetime.
> This is required on Apple silicon M2 and M2 Pro/Max/Ultra desktop
> systems. The HDMI output initialized by the bootloader requires keeping
> the display controller and a DP phy power domain on.
>
> Signed-off-by: Janne Grunau 

Reviewed-by: Eric Curtin 

Is mise le meas/Regards,

Eric Curtin

> ---
> Changes in v2:
> - removed broken drm_err() log statement only ment for debugging
> - removed commented cast
> - use correct format spcifier for 'int' in log statement
> - add 'continue;' after failure to get device for power_domain
> - use drm_warn() in non fatal error cases
> - removed duplicate PTR_ERR conversion
> - Link to v1: 
> https://lore.kernel.org/r/20230910-simpledrm-multiple-power-domains-v1-1-f8718aefc...@jannau.net
> ---
>  drivers/gpu/drm/tiny/simpledrm.c | 105 
> +++
>  1 file changed, 105 insertions(+)
>
> diff --git a/drivers/gpu/drm/tiny/simpledrm.c 
> b/drivers/gpu/drm/tiny/simpledrm.c
> index ff86ba1ae1b8..9c597461d1e2 100644
> --- a/drivers/gpu/drm/tiny/simpledrm.c
> +++ b/drivers/gpu/drm/tiny/simpledrm.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include 
> @@ -227,6 +228,12 @@ struct simpledrm_device {
> unsigned int regulator_count;
> struct regulator **regulators;
>  #endif
> +   /* power-domains */
> +#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
> +   int pwr_dom_count;
> +   struct device **pwr_dom_devs;
> +   struct device_link **pwr_dom_links;
> +#endif
>
> /* simplefb settings */
> struct drm_display_mode mode;
> @@ -468,6 +475,101 @@ static int simpledrm_device_init_regulators(struct 
> simpledrm_device *sdev)
>  }
>  #endif
>
> +#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
> +/*
> + * Generic power domain handling code.
> + *
> + * Here we handle the power-domains properties of our "simple-framebuffer"
> + * dt node. This is only necessary if there is more than one power-domain.
> + * A single power-domains is handled automatically by the driver core. 
> Multiple
> + * power-domains have to be handled by drivers since the driver core can't 
> know
> + * the correct power sequencing. Power sequencing is not an issue for 
> simpledrm
> + * since the bootloader has put the power domains already in the correct 
> state.
> + * simpledrm has only to ensure they remain active for its lifetime.
> + *
> + * When the driver unloads, we detach from the power-domains.
> + *
> + * We only complain about errors here, no action is taken as the most likely
> + * error can only happen due to a mismatch between the bootloader which set
> + * up the "simple-framebuffer" dt node, and the PM domain providers in the
> + * device tree. Chances are that there are no adverse effects, and if there 
> are,
> + * a clean teardown of the fb probe will not help us much either. So just
> + * complain and carry on, and hope that the user actually gets a working fb 
> at
> + * the end of things.
> + */
> +static void simpledrm_device_detach_genpd(void *res)
> +{
> +   int i;
> +   struct simpledrm_device *sdev = res;
> +
> +   if (sdev->pwr_dom_count <= 1)
> +   return;
> +
> +   for (i = sdev->pwr_dom_count - 1; i >= 0; i--) {
> +   if (!sdev->pwr_dom_links[i])
> +   device_link_del(sdev->pwr_dom_links[i]);
> +   if (!IS_ERR_OR_NULL(sdev->pwr_dom_devs[i]))
> +   dev_pm_domain_detach(sdev->pwr_dom_devs[i], true);
> +   }
> +}
> +
> +static int simpledrm_device_attach_genpd(struct simpledrm_device *sdev)
> +{
> +   struct device *dev = sdev->dev.dev;
> +   int i;
> +
> +   sdev->pwr_dom_count = of_count_phandle_with_args(dev->of_node, 
> "power-domains",
> +
> "#power-domain-cells");
> +   /*
> +* Single power-domain devices are handled by driver core nothing to 
> do
> +* here. The same for device nodes without "power-domains" property.
> +*/
> +   if (sdev->pwr_dom_count <= 1)
> +   return 0;
> +
> +   sdev->pwr_dom_devs = devm_kcalloc(dev, sdev->pwr_dom_count,
> +  sizeof(*sdev->pwr_dom_devs),
> +  GFP_KERNEL);
> +   if (!sdev->pwr_dom_devs)
> +   return -ENOMEM;
> +
> +   sdev->pwr_dom_links = devm_kcalloc(dev, sdev->pwr_dom_count,
> +   

[PATCH v2] drm/simpledrm: Add support for multiple "power-domains"

2023-09-12 Thread Janne Grunau via B4 Relay
From: Janne Grunau 

Multiple power domains need to be handled explicitly in each driver. The
driver core can not handle it automatically since it is not aware of
power sequencing requirements the hardware might have. This is not a
problem for simpledrm since everything is expected to be powered on by
the bootloader. simpledrm has just ensure it remains powered on during
its lifetime.
This is required on Apple silicon M2 and M2 Pro/Max/Ultra desktop
systems. The HDMI output initialized by the bootloader requires keeping
the display controller and a DP phy power domain on.

Signed-off-by: Janne Grunau 
---
Changes in v2:
- removed broken drm_err() log statement only ment for debugging
- removed commented cast
- use correct format spcifier for 'int' in log statement
- add 'continue;' after failure to get device for power_domain
- use drm_warn() in non fatal error cases
- removed duplicate PTR_ERR conversion
- Link to v1: 
https://lore.kernel.org/r/20230910-simpledrm-multiple-power-domains-v1-1-f8718aefc...@jannau.net
---
 drivers/gpu/drm/tiny/simpledrm.c | 105 +++
 1 file changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index ff86ba1ae1b8..9c597461d1e2 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -227,6 +228,12 @@ struct simpledrm_device {
unsigned int regulator_count;
struct regulator **regulators;
 #endif
+   /* power-domains */
+#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
+   int pwr_dom_count;
+   struct device **pwr_dom_devs;
+   struct device_link **pwr_dom_links;
+#endif
 
/* simplefb settings */
struct drm_display_mode mode;
@@ -468,6 +475,101 @@ static int simpledrm_device_init_regulators(struct 
simpledrm_device *sdev)
 }
 #endif
 
+#if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
+/*
+ * Generic power domain handling code.
+ *
+ * Here we handle the power-domains properties of our "simple-framebuffer"
+ * dt node. This is only necessary if there is more than one power-domain.
+ * A single power-domains is handled automatically by the driver core. Multiple
+ * power-domains have to be handled by drivers since the driver core can't know
+ * the correct power sequencing. Power sequencing is not an issue for simpledrm
+ * since the bootloader has put the power domains already in the correct state.
+ * simpledrm has only to ensure they remain active for its lifetime.
+ *
+ * When the driver unloads, we detach from the power-domains.
+ *
+ * We only complain about errors here, no action is taken as the most likely
+ * error can only happen due to a mismatch between the bootloader which set
+ * up the "simple-framebuffer" dt node, and the PM domain providers in the
+ * device tree. Chances are that there are no adverse effects, and if there 
are,
+ * a clean teardown of the fb probe will not help us much either. So just
+ * complain and carry on, and hope that the user actually gets a working fb at
+ * the end of things.
+ */
+static void simpledrm_device_detach_genpd(void *res)
+{
+   int i;
+   struct simpledrm_device *sdev = res;
+
+   if (sdev->pwr_dom_count <= 1)
+   return;
+
+   for (i = sdev->pwr_dom_count - 1; i >= 0; i--) {
+   if (!sdev->pwr_dom_links[i])
+   device_link_del(sdev->pwr_dom_links[i]);
+   if (!IS_ERR_OR_NULL(sdev->pwr_dom_devs[i]))
+   dev_pm_domain_detach(sdev->pwr_dom_devs[i], true);
+   }
+}
+
+static int simpledrm_device_attach_genpd(struct simpledrm_device *sdev)
+{
+   struct device *dev = sdev->dev.dev;
+   int i;
+
+   sdev->pwr_dom_count = of_count_phandle_with_args(dev->of_node, 
"power-domains",
+"#power-domain-cells");
+   /*
+* Single power-domain devices are handled by driver core nothing to do
+* here. The same for device nodes without "power-domains" property.
+*/
+   if (sdev->pwr_dom_count <= 1)
+   return 0;
+
+   sdev->pwr_dom_devs = devm_kcalloc(dev, sdev->pwr_dom_count,
+  sizeof(*sdev->pwr_dom_devs),
+  GFP_KERNEL);
+   if (!sdev->pwr_dom_devs)
+   return -ENOMEM;
+
+   sdev->pwr_dom_links = devm_kcalloc(dev, sdev->pwr_dom_count,
+   sizeof(*sdev->pwr_dom_links),
+   GFP_KERNEL);
+   if (!sdev->pwr_dom_links)
+   return -ENOMEM;
+
+   for (i = 0; i < sdev->pwr_dom_count; i++) {
+   sdev->pwr_dom_devs[i] = dev_pm_domain_attach_by_id(dev, i);
+   if (IS_ERR(sdev->pwr_dom_devs[i])) {
+   int ret = PTR_ERR(

Re: [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

2023-09-12 Thread Thomas Hellström



On 9/12/23 18:50, Danilo Krummrich wrote:

On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:

Hi, Danilo,

On 9/9/23 17:31, Danilo Krummrich wrote:

So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
 this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
 shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
 GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
 of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Big kudos to Boris Brezillon for his help to figure out locking for drivers
updating the GPU VA space within the fence signalling path.

Suggested-by: Matthew Brost 
Signed-off-by: Danilo Krummrich 
---
   drivers/gpu/drm/drm_gpuvm.c | 516 
   include/drm/drm_gpuvm.h | 197 ++
   2 files changed, 713 insertions(+)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index f4411047dbb3..8e62a043f719 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -73,6 +73,21 @@
* &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
* particular combination. If not existent a new instance is created and 
linked
* to the &drm_gem_object.
+ *
+ * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
+ * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
+ * list are maintained in order to accelerate locking of dma-resv locks and
+ * validation of evicted objects bound in a &drm_gpuvm. For instance the all
+ * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
+ * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
+ * order to validate all evicted &drm_gem_objects. It is also possible to lock
+ * additional &drm_gem_objects by providing the corresponding parameters to
+ * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
+ * use of helper functions such as drm_gpuvm_prepare_range() or
+ * drm_gpuvm_prepare_objects().
+ *
+ * Every bound &drm_gem_object is treated as external object when its &dma_resv
+ * structure is different than the &drm_gpuvm's common &dma_resv structure.
*/
   /**
@@ -420,6 +435,20 @@
* Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
* &drm_gem_object must be able to observe previous creations and 
destructions
* of &drm_gpuvm_bos in order to keep instances unique.
+ *
+ * The &drm_gpuvm's lists for keeping track of external and evicted objects are
+ * protected against concurrent insertion / removal and iteration internally.
+ *
+ * However, drivers still need ensure to protect concurrent calls to functions
+ * iterating those lists, such as drm_gpuvm_validate() and
+ * drm_gpuvm_prepare_objects(). Every such function contains a particular
+ * comment and lockdep checks if possible.
+ *
+ * Functions adding or removing entries from those lists, such as
+ * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with 
external
+ * locks being held, e.g. in order to avoid the corresponding list to be
+ * (safely) modified while potentially being iternated by other API functions.
+ * However, this is entirely optional.
*/
   /**
@@ -632,6 +661,131 @@
*   }
*/
+/**
+ * get_next_vm_bo_from_list() - get the next vm_bo element
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ * @__local_list: A pointer to the local list used to store already iterated 
items
+ * @__prev_vm_bo: The previous element we got from 
drm_gpuvm_get_next_cached_vm_bo()
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from
+ * the list, so list insertion deletion can happen concurrently.

Are the list spinlocks needed for that async state update from within the
dma-fence critical section we've discussed previously?

Yes, but also for ot

Re: [PATCH v7] Documentation/gpu: Add a VM_BIND async document

2023-09-12 Thread Zanoni, Paulo R
On Mon, 2023-09-11 at 14:47 +0200, Thomas Hellström wrote:
> Add a motivation for and description of asynchronous VM_BIND operation
> 
> v2:
> - Fix typos (Nirmoy Das)
> - Improve the description of a memory fence (Oak Zeng)
> - Add a reference to the document in the Xe RFC.
> - Add pointers to sample uAPI suggestions
> v3:
> - Address review comments (Danilo Krummrich)
> - Formatting fixes
> v4:
> - Address typos (Francois Dugast)
> - Explain why in-fences are not allowed for VM_BIND operations for long-
>   running workloads (Matthew Brost)
> v5:
> - More typo- and style fixing
> - Further clarify the implications of disallowing in-fences for VM_BIND
>   operations for long-running workloads (Matthew Brost)
> v6:
> - Point out that a gpu_vm is a virtual GPU Address space.
>   (Danilo Krummrich)
> - For an explanation of dma-fences point to the dma-fence documentation.
>   (Paolo Zanoni)
> - Clarify that VM_BIND errors are reported synchronously. (Paulo Zanoni)
> - Use an rst doc reference when pointing to the async vm_bind document
>   from the xe merge plan.
> - Add the VM_BIND documentation to the drm documentation table-of-content,
>   using an intermediate "Misc DRM driver uAPI- and feature implementation
>   guidelines"
> v7:
> - Update the error handling documentation to remove the VM error state.
> 
> Cc: Paulo R Zanoni 

I was asked to give my input (ack or nack) here, since I'm working on
the Sparse implementation for Anv, which makes use of vm_bind (mesa MR
23045).

I understand that this text is mostly "implementation guidelines for
drivers". While I understand what's written in the text, I think it's
way too vague for me - wearing my user-space user-of-these-interfaces
hat - to make sense of:  I still don't know what exactly I'm supposed
to do with it, especially the error handling paths.

I was waiting to see if a proposal for xe.ko implementation would
appear before I could actually make full sense of this text and then
ack or nack anything. That's still my plan. More below.


> Signed-off-by: Thomas Hellström 
> Acked-by: Nirmoy Das 
> Reviewed-by: Danilo Krummrich 
> Reviewed-by: Matthew Brost 
> Reviewed-by: Rodrigo Vivi 
> ---
>  Documentation/gpu/drm-vm-bind-async.rst   | 155 ++
>  .../gpu/implementation_guidelines.rst |   9 +
>  Documentation/gpu/index.rst   |   1 +
>  Documentation/gpu/rfc/xe.rst  |   4 +-
>  4 files changed, 167 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/gpu/drm-vm-bind-async.rst
>  create mode 100644 Documentation/gpu/implementation_guidelines.rst
> 
> diff --git a/Documentation/gpu/drm-vm-bind-async.rst 
> b/Documentation/gpu/drm-vm-bind-async.rst
> new file mode 100644
> index ..f12f794408b9
> --- /dev/null
> +++ b/Documentation/gpu/drm-vm-bind-async.rst
> @@ -0,0 +1,155 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
> +
> +
> +Asynchronous VM_BIND
> +
> +
> +Nomenclature:
> +=
> +
> +* ``VRAM``: On-device memory. Sometimes referred to as device local memory.
> +
> +* ``gpu_vm``: A virtual GPU address space. Typically per process, but
> +  can be shared by multiple processes.
> +
> +* ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
> +  an IOCTL. The operations include mapping and unmapping system- or
> +  VRAM memory.
> +
> +* ``syncobj``: A container that abstracts synchronization objects. The
> +  synchronization objects can be either generic, like dma-fences or
> +  driver specific. A syncobj typically indicates the type of the
> +  underlying synchronization object.
> +
> +* ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
> +  for these before starting.
> +
> +* ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
> +  signals these when the bind operation is complete.
> +
> +* ``dma-fence``: A cross-driver synchronization object. A basic
> +  understanding of dma-fences is required to digest this
> +  document. Please refer to the ``DMA Fences`` section of the
> +  :doc:`dma-buf doc `.
> +
> +* ``memory fence``: A synchronization object, different from a dma-fence.
> +  A memory fence uses the value of a specified memory location to determine
> +  signaled status. A memory fence can be awaited and signaled by both
> +  the GPU and CPU. Memory fences are sometimes referred to as
> +  user-fences, userspace-fences or gpu futexes and do not necessarily obey
> +  the dma-fence rule of signaling within a "reasonable amount of time".
> +  The kernel should thus avoid waiting for memory fences with locks held.
> +
> +* ``long-running workload``: A workload that may take more than the
> +  current stipulated dma-fence maximum signal delay to complete and
> +  which therefore needs to set the gpu_vm or the GPU execution context in
> +  a certain mode that disallows completion dma-fences.
> +
> +* ``exec function``: An exec function is a function that 

Re: [RFC PATCH v1 01/12] Revert "drm/sysfs: Link DRM connectors to corresponding Type-C connectors"

2023-09-12 Thread Dmitry Baryshkov

On 12/09/2023 14:05, Heikki Krogerus wrote:

On Tue, Sep 12, 2023 at 12:15:10AM +0300, Dmitry Baryshkov wrote:

On 06/09/2023 16:38, Heikki Krogerus wrote:

On Wed, Sep 06, 2023 at 03:48:35PM +0300, Dmitry Baryshkov wrote:

On Wed, 6 Sept 2023 at 15:44, Heikki Krogerus
 wrote:


On Tue, Sep 05, 2023 at 01:56:59PM +0300, Dmitry Baryshkov wrote:

Hi Heikki,

On Tue, 5 Sept 2023 at 11:50, Heikki Krogerus
 wrote:


Hi Dmitry,

On Mon, Sep 04, 2023 at 12:41:39AM +0300, Dmitry Baryshkov wrote:

The kdev->fwnode pointer is never set in drm_sysfs_connector_add(), so
dev_fwnode() checks never succeed, making the respective commit NOP.


That's not true. The dev->fwnode is assigned when the device is
created on ACPI platforms automatically. If the drm_connector fwnode
member is assigned before the device is registered, then that fwnode
is assigned also to the device - see drm_connector_acpi_find_companion().

But please note that even if drm_connector does not have anything in
its fwnode member, the device may still be assigned fwnode, just based
on some other logic (maybe in drivers/acpi/acpi_video.c?).


And if drm_sysfs_connector_add() is modified to set kdev->fwnode, it
breaks drivers already using components (as it was pointed at [1]),
resulting in a deadlock. Lockdep trace is provided below.

Granted these two issues, it seems impractical to fix this commit in any
sane way. Revert it instead.


I think there is already user space stuff that relies on these links,
so I'm not sure you can just remove them like that. If the component
framework is not the correct tool here, then I think you need to
suggest some other way of creating them.


The issue (that was pointed out during review) is that having a
component code in the framework code can lead to lockups. With the
patch #2 in place (which is the only logical way to set kdev->fwnode
for non-ACPI systems) probing of drivers which use components and set
drm_connector::fwnode breaks immediately.

Can we move the component part to the respective drivers? With the
patch 2 in place, connector->fwnode will be copied to the created
kdev's fwnode pointer.

Another option might be to make this drm_sysfs component registration optional.


You don't need to use the component framework at all if there is
a better way of determining the connection between the DP and its
Type-C connector (I'm assuming that that's what this series is about).
You just need the symlinks, not the component.


The problem is that right now this component registration has become
mandatory. And if I set the kdev->fwnode manually (like in the patch
2), the kernel hangs inside the component code.
That's why I proposed to move the components to the place where they
are really necessary, e.g. i915 and amd drivers.


So why can't we replace the component with the method you are
proposing in this series of finding out the Type-C port also with
i915, AMD, or whatever driver and platform (that's the only thing that
component is used for)?


The drm/msm driver uses drm_bridge for the pipeline (including the last DP
entry) and the drm_bridge_connector to create the connector. I think that
enabling i915 and AMD drivers to use drm_bridge fells out of scope for this
series.



Determining the connection between a DP and its Type-C connector is
starting to get really important, so ideally we have a common solution
for that.


Yes. This is what we have been discussing with Simon for quite some time on
#dri-devel.

Unfortunately I think the solution that got merged was pretty much hastened
in instead of being well-thought. For example, it is also not always
possible to provide the drm_connector / typec_connector links (as you can
see from the patch7. Sometimes we can only express that this is a Type-C DP
connector, but we can not easily point it to the particular USB-C port.

So, I'm not sure, how can we proceed here. Currently merged patch breaks
drm/msm if we even try to use it by setting kdef->fwnode to
drm_connector->fwnode. The pointed out `drivers/usb/typec/port-mapper.c` is
an ACPI-only thing, which is not expected to work in a non-ACPI cases.


You really have to always supply not only the Type-C ports and partners,
but also the alt modes. You need them, firstly to keep things sane
inside kernel, but more importantly, so they are always exposed to the
user space, AND, always the same way. We have ABIs for all this stuff,
including the DP alt mode. Use them. No shortcuts.

So here's what you need to do. UCSI does not seem to bring you
anything useful, so just disable it for now. You don't need it. Your
port driver is clearly drivers/soc/qcom/pmic_glink_altmode.c, so
that's where you need to register all these components - the ports,
partners and alt modes. You have all the needed information there.


To make things even more complicate, UCSI is necessary for the USB part 
of the story. It handles vbus and direction.



Only after you've done that we can start to look at how should the
connection between th

Re: [Freedreno] [PATCH] drm/msm/dp: skip validity check for DP CTS EDID checksum

2023-09-12 Thread Jani Nikula
On Tue, 12 Sep 2023, Abhinav Kumar  wrote:
> Hi Jani
>
> On 9/12/2023 5:16 AM, Jani Nikula wrote:
>> On Thu, 07 Sep 2023, Stephen Boyd  wrote:
>>> Quoting Jani Nikula (2023-09-01 07:20:34)
 The DP CTS test for EDID last block checksum expects the checksum for
 the last block, invalid or not. Skip the validity check.

 For the most part (*), the EDIDs returned by drm_get_edid() will be
 valid anyway, and there's the CTS workaround to get the checksum for
 completely invalid EDIDs. See commit 7948fe12d47a ("drm/msm/dp: return
 correct edid checksum after corrupted edid checksum read").

 This lets us remove one user of drm_edid_block_valid() with hopes the
 function can be removed altogether in the future.

 (*) drm_get_edid() ignores checksum errors on CTA extensions.

 Cc: Abhinav Kumar 
 Cc: Dmitry Baryshkov 
 Cc: Kuogee Hsieh 
 Cc: Marijn Suijten 
 Cc: Rob Clark 
 Cc: Sean Paul 
 Cc: Stephen Boyd 
 Cc: linux-arm-...@vger.kernel.org
 Cc: freedr...@lists.freedesktop.org
 Signed-off-by: Jani Nikula 
 ---
>>>
>>> Reviewed-by: Stephen Boyd 
>> 
>> Thanks; is that enough to merge? I can't claim I would have been able to
>> test this.
>> 
>
> Reviewed-by: Abhinav Kumar 
>
> Change looks fine.
>
> We can pick this up in the MSM tree if you would like.

I'd appreciate that, thanks!

BR,
Jani.

>
> Dmitry, you can please pick this up along with my R-b and Kuogee's R-b 
> as well.
>
> I think his R-b got misformatted. I can ask him to add that again.
>
>>>

 diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
 b/drivers/gpu/drm/msm/dp/dp_panel.c
 index 42d52510ffd4..86a8e06c7a60 100644
 --- a/drivers/gpu/drm/msm/dp/dp_panel.c
 +++ b/drivers/gpu/drm/msm/dp/dp_panel.c
 @@ -289,26 +289,9 @@ int dp_panel_get_modes(struct dp_panel *dp_panel,

   static u8 dp_panel_get_edid_checksum(struct edid *edid)
>>>
>>> It would be nice to make 'edid' const here in another patch.
>> 
>> Sure.
>> 
>> BR,
>> Jani.
>> 
>> 

-- 
Jani Nikula, Intel


Re: [PATCH v3 9/9] drm: ci: Use scripts/config to enable/disable configs

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Instead of modifying files in git to enable/disable
configs, use scripts/config on the .config file which
will be used for building the kernel.

Suggested-by: Jani Nikula 
Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - Added a new patch in the series to use scripts/config to enable/disable 
configs

v3:
   - No changes

---
  drivers/gpu/drm/ci/build.sh | 14 +++---
  1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ci/build.sh b/drivers/gpu/drm/ci/build.sh
index 092c195af242..093929a115de 100644
--- a/drivers/gpu/drm/ci/build.sh
+++ b/drivers/gpu/drm/ci/build.sh
@@ -70,19 +70,19 @@ if [ -z "$CI_MERGE_REQUEST_PROJECT_PATH" ]; then
  fi
  fi
  
-for opt in $ENABLE_KCONFIGS; do

-  echo CONFIG_$opt=y >> drivers/gpu/drm/ci/${KERNEL_ARCH}.config
-done
-for opt in $DISABLE_KCONFIGS; do
-  echo CONFIG_$opt=n >> drivers/gpu/drm/ci/${KERNEL_ARCH}.config
-done
-
  if [[ -n "${MERGE_FRAGMENT}" ]]; then
  ./scripts/kconfig/merge_config.sh ${DEFCONFIG} 
drivers/gpu/drm/ci/${MERGE_FRAGMENT}
  else
  make `basename ${DEFCONFIG}`
  fi
  
+for opt in $ENABLE_KCONFIGS; do

+./scripts/config --enable CONFIG_$opt
+done
+for opt in $DISABLE_KCONFIGS; do
+./scripts/config --disable CONFIG_$opt
+done
+
  make ${KERNEL_IMAGE_NAME}
  
  mkdir -p /lava-files/


Re: [PATCH v3 8/9] drm: ci: Enable new jobs

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Enable the following jobs, as the issues noted in the
TODO comments have been resolved. This will ensure that these jobs
are now included and executed as part of the CI/CD pipeline.

msm:apq8016:
TODO: current issue: it is not fiding the NFS root. Fix and remove this rule.

mediatek:mt8173:
TODO: current issue: device is hanging. Fix and remove this rule.

virtio_gpu:none:
TODO: current issue: malloc(): corrupted top size. Fix and remove this rule.

Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - Reworded the commit message

v3:
   - No changes

---
  drivers/gpu/drm/ci/test.yml | 9 -
  1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index d85add39f425..1771af21e2d9 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -108,9 +108,6 @@ msm:apq8016:
  RUNNER_TAG: google-freedreno-db410c
script:
  - ./install/bare-metal/fastboot.sh
-  rules:
-# TODO: current issue: it is not fiding the NFS root. Fix and remove this 
rule.
-- when: never
  
  msm:apq8096:

extends:
@@ -273,9 +270,6 @@ mediatek:mt8173:
  DEVICE_TYPE: mt8173-elm-hana
  GPU_VERSION: mt8173
  RUNNER_TAG: mesa-ci-x86-64-lava-mt8173-elm-hana
-  rules:
-# TODO: current issue: device is hanging. Fix and remove this rule.
-- when: never
  
  mediatek:mt8183:

extends:
@@ -333,6 +327,3 @@ virtio_gpu:none:
  - debian/x86_64_test-gl
  - testing:x86_64
  - igt:x86_64
-  rules:
-# TODO: current issue: malloc(): corrupted top size. Fix and remove this 
rule.
-- when: never
\ No newline at end of file


Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Danilo Krummrich

On 9/12/23 17:13, Boris Brezillon wrote:

On Tue, 12 Sep 2023 16:49:09 +0200
Boris Brezillon  wrote:


On Tue, 12 Sep 2023 16:33:01 +0200
Danilo Krummrich  wrote:


On 9/12/23 16:28, Boris Brezillon wrote:

On Thu, 17 Aug 2023 13:13:31 +0200
Danilo Krummrich  wrote:
 

I think that's a misunderstanding. I'm not trying to say that it is
*always* beneficial to fill up the ring as much as possible. But I think
it is under certain circumstances, exactly those circumstances I
described for Nouveau.

As mentioned, in Nouveau the size of a job is only really limited by the
ring size, which means that one job can (but does not necessarily) fill
up the whole ring. We both agree that this is inefficient, because it
potentially results into the HW run dry due to hw_submission_limit == 1.

I recognize you said that one should define hw_submission_limit and
adjust the other parts of the equation accordingly, the options I see are:

(1) Increase the ring size while keeping the maximum job size.
(2) Decrease the maximum job size while keeping the ring size.
(3) Let the scheduler track the actual job size rather than the maximum
job size.

(1) results into potentially wasted ring memory, because we're not
always reaching the maximum job size, but the scheduler assumes so.

(2) results into more IOCTLs from userspace for the same amount of IBs
and more jobs result into more memory allocations and more work being
submitted to the workqueue (with Matt's patches).

(3) doesn't seem to have any of those draw backs.

What would be your take on that?

Actually, if none of the other drivers is interested into a more precise
way of keeping track of the ring utilization, I'd be totally fine to do
it in a driver specific way. However, unfortunately I don't see how this
would be possible.


I'm not entirely sure, but I think PowerVR is pretty close to your
description: jobs size is dynamic size, and the ring buffer size is
picked by the driver at queue initialization time. What we did was to
set hw_submission_limit to an arbitrarily high value of 64k (we could
have used something like ringbuf_size/min_job_size instead), and then
have the control flow implemented with ->prepare_job() [1] (CCCB is the
PowerVR ring buffer). This allows us to maximize ring buffer utilization
while still allowing dynamic-size jobs.


I guess this would work, but I think it would be better to bake this in,
especially if more drivers do have this need. I already have an
implementation [1] for doing that in the scheduler. My plan was to push
that as soon as Matt sends out V3.

[1] 
https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95


PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
native fences waits are passed to the FW, and those add to the job size.
When we know our job is ready for execution (all non-native deps are
signaled), we evict already signaled native-deps (or native fences) to
shrink the job size further more, but that's something we need to
calculate late if we want the job size to be minimal. Of course, we can
always over-estimate the job size, but if we go for a full-blown
drm_sched integration, I wonder if it wouldn't be preferable to have a
->get_job_size() callback returning the number of units needed by job,
and have the core pick 1 when the hook is not implemented.


FWIW, I think last time I asked how to do that, I've been pointed to
->prepare_job() by someone  (don't remember if it was Daniel or
Christian), hence the PowerVR implementation. If that's still the
preferred solution, there's some opportunity to have a generic layer to
automate ringbuf utilization tracking and some helpers to prepare
wait_for_ringbuf dma_fences that drivers could return from
->prepare_job() (those fences would then be signaled when the driver
calls drm_ringbuf_job_done() and the next job waiting for ringbuf space
now fits in the ringbuf).



Not sure I like that, it's basically a different implementation to work
around limitations of an implementation that is supposed to cover this case
in general.



Re: [PATCH v3 6/9] arm64: defconfig: Enable DA9211 regulator

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Mediatek mt8173 board fails to boot with DA9211 regulator disabled.
Enabling CONFIG_REGULATOR_DA9211=y in drm-ci fixes the issue.

So enable it in the defconfig since kernel-ci also requires it.


tbh, =m doesn't solve for mesa-ci (since we don't use an initrd, not 
sure if it solves for kernel-ci.




Suggested-by: AngeloGioacchino Del Regno 

Signed-off-by: Vignesh Raman 
In any case:

Acked-by: Helen Koike 


---

v3:
   - New patch in the series to enable CONFIG_REGULATOR_DA9211 in defconfig

---
  arch/arm64/configs/defconfig | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index a25d783dfb95..ef22b532b63a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -711,6 +711,7 @@ CONFIG_REGULATOR_AXP20X=y
  CONFIG_REGULATOR_BD718XX=y
  CONFIG_REGULATOR_BD9571MWV=y
  CONFIG_REGULATOR_CROS_EC=y
+CONFIG_REGULATOR_DA9211=m


Question for the maintainers: would it be acceptable to make it a =y 
instead of =m here ? Since this is something required for booting.


Regards,
Helen


  CONFIG_REGULATOR_FAN53555=y
  CONFIG_REGULATOR_GPIO=y
  CONFIG_REGULATOR_HI6421V530=y


Re: [PATCH v3 5/9] drm: ci: Enable regulator

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Enable CONFIG_REGULATOR_DA9211=y to fix mt8173 boot issue.

Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - No changes

v3:
   - Remove CONFIG_RTC_DRV_MT6397=y as it is already enabled in defconfig

---
  drivers/gpu/drm/ci/arm64.config | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ci/arm64.config b/drivers/gpu/drm/ci/arm64.config
index 817e18ddfd4f..ca95e141a7ae 100644
--- a/drivers/gpu/drm/ci/arm64.config
+++ b/drivers/gpu/drm/ci/arm64.config
@@ -184,6 +184,7 @@ CONFIG_HW_RANDOM_MTK=y
  CONFIG_MTK_DEVAPC=y
  CONFIG_PWM_MTK_DISP=y
  CONFIG_MTK_CMDQ=y
+CONFIG_REGULATOR_DA9211=y
  
  # For nouveau.  Note that DRM must be a module so that it's loaded after NFS is up to provide the firmware.

  CONFIG_ARCH_TEGRA=y


Re: [PATCH v3 4/9] drm: ci: virtio: Update ci variables

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Update ci variables to fix the below error,
ERROR - Igt error: malloc(): corrupted top size
ERROR - Igt error: Received signal SIGABRT.
ERROR - Igt error: Stack trace:
ERROR - Igt error:  #0 [fatal_sig_handler+0x17b]

Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - No changes

v3:
   - No changes

---
  drivers/gpu/drm/ci/test.yml | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ci/test.yml b/drivers/gpu/drm/ci/test.yml
index 6473cddaa7a9..d85add39f425 100644
--- a/drivers/gpu/drm/ci/test.yml
+++ b/drivers/gpu/drm/ci/test.yml
@@ -316,8 +316,11 @@ virtio_gpu:none:
stage: virtio-gpu
variables:
  CROSVM_GALLIUM_DRIVER: llvmpipe
-DRIVER_NAME: virtio_gpu
+DRIVER_NAME: virtio
  GPU_VERSION: none
+CROSVM_MEMORY: 12288
+CROSVM_CPU: $FDO_CI_CONCURRENT
+CROSVM_GPU_ARGS: 
"vulkan=true,gles=false,backend=virglrenderer,egl=true,surfaceless=true"
extends:
  - .test-gl
tags:


Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Danilo Krummrich

On 9/12/23 16:49, Boris Brezillon wrote:

On Tue, 12 Sep 2023 16:33:01 +0200
Danilo Krummrich  wrote:


On 9/12/23 16:28, Boris Brezillon wrote:

On Thu, 17 Aug 2023 13:13:31 +0200
Danilo Krummrich  wrote:
   

I think that's a misunderstanding. I'm not trying to say that it is
*always* beneficial to fill up the ring as much as possible. But I think
it is under certain circumstances, exactly those circumstances I
described for Nouveau.

As mentioned, in Nouveau the size of a job is only really limited by the
ring size, which means that one job can (but does not necessarily) fill
up the whole ring. We both agree that this is inefficient, because it
potentially results into the HW run dry due to hw_submission_limit == 1.

I recognize you said that one should define hw_submission_limit and
adjust the other parts of the equation accordingly, the options I see are:

(1) Increase the ring size while keeping the maximum job size.
(2) Decrease the maximum job size while keeping the ring size.
(3) Let the scheduler track the actual job size rather than the maximum
job size.

(1) results into potentially wasted ring memory, because we're not
always reaching the maximum job size, but the scheduler assumes so.

(2) results into more IOCTLs from userspace for the same amount of IBs
and more jobs result into more memory allocations and more work being
submitted to the workqueue (with Matt's patches).

(3) doesn't seem to have any of those draw backs.

What would be your take on that?

Actually, if none of the other drivers is interested into a more precise
way of keeping track of the ring utilization, I'd be totally fine to do
it in a driver specific way. However, unfortunately I don't see how this
would be possible.


I'm not entirely sure, but I think PowerVR is pretty close to your
description: jobs size is dynamic size, and the ring buffer size is
picked by the driver at queue initialization time. What we did was to
set hw_submission_limit to an arbitrarily high value of 64k (we could
have used something like ringbuf_size/min_job_size instead), and then
have the control flow implemented with ->prepare_job() [1] (CCCB is the
PowerVR ring buffer). This allows us to maximize ring buffer utilization
while still allowing dynamic-size jobs.


I guess this would work, but I think it would be better to bake this in,
especially if more drivers do have this need. I already have an
implementation [1] for doing that in the scheduler. My plan was to push
that as soon as Matt sends out V3.

[1] 
https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95


PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
native fences waits are passed to the FW, and those add to the job size.
When we know our job is ready for execution (all non-native deps are
signaled), we evict already signaled native-deps (or native fences) to
shrink the job size further more, but that's something we need to
calculate late if we want the job size to be minimal. Of course, we can
always over-estimate the job size, but if we go for a full-blown
drm_sched integration, I wonder if it wouldn't be preferable to have a
->get_job_size() callback returning the number of units needed by job,
and have the core pick 1 when the hook is not implemented.



Sure, why not. Sounds reasonable to me.



Re: [PATCH v3 3/9] drm: ci: Force db410c to host mode

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Force db410c to host mode to fix network issue which results in failure
to mount root fs via NFS.
See 
https://gitlab.freedesktop.org/gfx-ci/linux/-/commit/cb72a629b8c15c80a54dda510743cefd1c4b65b8

Compile the base device tree with overlay support and use fdtoverlay
command to merge base device tree with an overlay which contains the
fix for USB controllers to work in host mode. [suggested by Maxime Ripard]

Suggested-by: Maxime Ripard 
Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - Use fdtoverlay command to merge overlay dtbo with the base dtb instead of 
modifying the kernel sources

v3:
   - drm-ci scripts to use device tree overlay from arch/arm64/boot/dts/qcom 
and compile base device tree with overlay support

---
  drivers/gpu/drm/ci/build.sh | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ci/build.sh b/drivers/gpu/drm/ci/build.sh
index 7b014287a041..092c195af242 100644
--- a/drivers/gpu/drm/ci/build.sh
+++ b/drivers/gpu/drm/ci/build.sh
@@ -91,7 +91,12 @@ for image in ${KERNEL_IMAGE_NAME}; do
  done
  
  if [[ -n ${DEVICE_TREES} ]]; then

-make dtbs
+make DTC_FLAGS=-@ dtbs
+if [[ -e arch/arm64/boot/dts/qcom/apq8016-sbc.dtb ]]; then
+dtc -@ -I dts -O dtb -o 
arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtbo 
arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso
+fdtoverlay -i arch/arm64/boot/dts/qcom/apq8016-sbc.dtb -o 
arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtb 
arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtbo
+mv arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtb 
arch/arm64/boot/dts/qcom/apq8016-sbc.dtb
+fi
  cp ${DEVICE_TREES} /lava-files/.
  fi
  


Re: [PATCH v3 2/9] arm64: dts: qcom: apq8016-sbc: Add overlay for usb host mode

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

Due to the presence of the fastboot micro cable in the CI farm,
it causes the hardware to remain in gadget mode instead of host mode.
So it doesn't find the network, which results in failure to mount root
fs via NFS.

Add an overlay dtso file that sets the dr_mode to host, allowing
the USB controllers to work in host mode. This dtso file will be used
in drm-ci, mesa-ci.

Overlay DT file uses the sugar syntax [suggested by Dmitry Baryshkov and Maxime 
Ripard]

Suggested-by: Dmitry Baryshkov 
Suggested-by: Maxime Ripard 
Signed-off-by: Helen Koike 
Signed-off-by: David Heidelberg 
Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v3:
   - New patch in the series to add device tree overlay in 
arch/arm64/boot/dts/qcom

---
  arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso | 8 
  1 file changed, 8 insertions(+)
  create mode 100644 arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso

diff --git a/arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso 
b/arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso
new file mode 100644
index ..a82c26b7eae8
--- /dev/null
+++ b/arch/arm64/boot/dts/qcom/apq8016-sbc-usb-host.dtso
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+/dts-v1/;
+/plugin/;
+
+&usb {
+ dr_mode = "host";
+};


Re: [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

2023-09-12 Thread Danilo Krummrich
On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote:
> Hi, Danilo,
> 
> On 9/9/23 17:31, Danilo Krummrich wrote:
> > So far the DRM GPUVA manager offers common infrastructure to track GPU VA
> > allocations and mappings, generically connect GPU VA mappings to their
> > backing buffers and perform more complex mapping operations on the GPU VA
> > space.
> > 
> > However, there are more design patterns commonly used by drivers, which
> > can potentially be generalized in order to make the DRM GPUVA manager
> > represent a basic GPU-VM implementation. In this context, this patch aims
> > at generalizing the following elements.
> > 
> > 1) Provide a common dma-resv for GEM objects not being used outside of
> > this GPU-VM.
> > 
> > 2) Provide tracking of external GEM objects (GEM objects which are
> > shared with other GPU-VMs).
> > 
> > 3) Provide functions to efficiently lock all GEM objects dma-resv the
> > GPU-VM contains mappings of.
> > 
> > 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
> > of, such that validation of evicted GEM objects is accelerated.
> > 
> > 5) Provide some convinience functions for common patterns.
> > 
> > Rather than being designed as a "framework", the target is to make all
> > features appear as a collection of optional helper functions, such that
> > drivers are free to make use of the DRM GPUVA managers basic
> > functionality and opt-in for other features without setting any feature
> > flags, just by making use of the corresponding functions.
> > 
> > Big kudos to Boris Brezillon for his help to figure out locking for drivers
> > updating the GPU VA space within the fence signalling path.
> > 
> > Suggested-by: Matthew Brost 
> > Signed-off-by: Danilo Krummrich 
> > ---
> >   drivers/gpu/drm/drm_gpuvm.c | 516 
> >   include/drm/drm_gpuvm.h | 197 ++
> >   2 files changed, 713 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index f4411047dbb3..8e62a043f719 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -73,6 +73,21 @@
> >* &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
> >* particular combination. If not existent a new instance is created and 
> > linked
> >* to the &drm_gem_object.
> > + *
> > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also 
> > used
> > + * as entry for the &drm_gpuvm's lists of external and evicted objects. 
> > Those
> > + * list are maintained in order to accelerate locking of dma-resv locks and
> > + * validation of evicted objects bound in a &drm_gpuvm. For instance the 
> > all
> > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by 
> > calling
> > + * drm_gpuvm_exec_lock(). Once locked drivers can call 
> > drm_gpuvm_validate() in
> > + * order to validate all evicted &drm_gem_objects. It is also possible to 
> > lock
> > + * additional &drm_gem_objects by providing the corresponding parameters to
> > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while 
> > making
> > + * use of helper functions such as drm_gpuvm_prepare_range() or
> > + * drm_gpuvm_prepare_objects().
> > + *
> > + * Every bound &drm_gem_object is treated as external object when its 
> > &dma_resv
> > + * structure is different than the &drm_gpuvm's common &dma_resv structure.
> >*/
> >   /**
> > @@ -420,6 +435,20 @@
> >* Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
> >* &drm_gem_object must be able to observe previous creations and 
> > destructions
> >* of &drm_gpuvm_bos in order to keep instances unique.
> > + *
> > + * The &drm_gpuvm's lists for keeping track of external and evicted 
> > objects are
> > + * protected against concurrent insertion / removal and iteration 
> > internally.
> > + *
> > + * However, drivers still need ensure to protect concurrent calls to 
> > functions
> > + * iterating those lists, such as drm_gpuvm_validate() and
> > + * drm_gpuvm_prepare_objects(). Every such function contains a particular
> > + * comment and lockdep checks if possible.
> > + *
> > + * Functions adding or removing entries from those lists, such as
> > + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with 
> > external
> > + * locks being held, e.g. in order to avoid the corresponding list to be
> > + * (safely) modified while potentially being iternated by other API 
> > functions.
> > + * However, this is entirely optional.
> >*/
> >   /**
> > @@ -632,6 +661,131 @@
> >*}
> >*/
> > +/**
> > + * get_next_vm_bo_from_list() - get the next vm_bo element
> > + * @__gpuvm: The GPU VM
> > + * @__list_name: The name of the list we're iterating on
> > + * @__local_list: A pointer to the local list used to store already 
> > iterated items
> > + * @__prev_vm_bo: The previous element we got from 
> > drm_gpuvm_get_n

Re: [PATCH v3 1/9] drm: ci: igt_runner: Remove todo

2023-09-12 Thread Helen Koike




On 08/09/2023 12:22, Vignesh Raman wrote:

/sys/kernel/debug/dri/*/state exist for every atomic KMS driver.
We do not test non-atomic drivers, so remove the todo.

Signed-off-by: Vignesh Raman 


Acked-by: Helen Koike 


---

v2:
   - No changes

v3:
   - No changes
   
---

  drivers/gpu/drm/ci/igt_runner.sh | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ci/igt_runner.sh b/drivers/gpu/drm/ci/igt_runner.sh
index 2bb759165063..5bf130ac57c9 100755
--- a/drivers/gpu/drm/ci/igt_runner.sh
+++ b/drivers/gpu/drm/ci/igt_runner.sh
@@ -15,7 +15,6 @@ cat /sys/kernel/debug/device_component/*
  '
  
  # Dump drm state to confirm that kernel was able to find a connected display:

-# TODO this path might not exist for all drivers.. maybe run modetest instead?
  set +e
  cat /sys/kernel/debug/dri/*/state
  set -e


[PATCH 7/8] drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp

2023-09-12 Thread Mitul Golani
From: Swati Sharma 

DSC_Sink_BPP_Precision entry is added to i915_dsc_fec_support_show
to depict sink's precision.
Also, new debugfs entry is created to enforce fractional bpp.
If Force_DSC_Fractional_BPP_en is set then while iterating over
output bpp with fractional step size we will continue if output_bpp is
computed as integer. With this approach, we will be able to validate
DSC with fractional bpp.

v2:
Add drm_modeset_unlock to new line(Suraj)

Signed-off-by: Swati Sharma 
Signed-off-by: Ankit Nautiyal 
Signed-off-by: Mitul Golani 
Reviewed-by: Suraj Kandpal 
---
 .../drm/i915/display/intel_display_debugfs.c  | 83 +++
 .../drm/i915/display/intel_display_types.h|  1 +
 2 files changed, 84 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c 
b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
index f05b52381a83..776ab96def1f 100644
--- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c
+++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c
@@ -1244,6 +1244,8 @@ static int i915_dsc_fec_support_show(struct seq_file *m, 
void *data)
  
DP_DSC_YCbCr420_Native)),
   
str_yes_no(drm_dp_dsc_sink_supports_format(intel_dp->dsc_dpcd,
  
DP_DSC_YCbCr444)));
+   seq_printf(m, "DSC_Sink_BPP_Precision: %d\n",
+  drm_dp_dsc_sink_bpp_incr(intel_dp->dsc_dpcd));
seq_printf(m, "Force_DSC_Enable: %s\n",
   str_yes_no(intel_dp->force_dsc_en));
if (!intel_dp_is_edp(intel_dp))
@@ -1436,6 +1438,84 @@ static const struct file_operations 
i915_dsc_output_format_fops = {
.write = i915_dsc_output_format_write
 };
 
+static int i915_dsc_fractional_bpp_show(struct seq_file *m, void *data)
+{
+   struct drm_connector *connector = m->private;
+   struct drm_device *dev = connector->dev;
+   struct drm_crtc *crtc;
+   struct intel_dp *intel_dp;
+   struct intel_encoder *encoder = 
intel_attached_encoder(to_intel_connector(connector));
+   int ret;
+
+   if (!encoder)
+   return -ENODEV;
+
+   ret = 
drm_modeset_lock_single_interruptible(&dev->mode_config.connection_mutex);
+   if (ret)
+   return ret;
+
+   crtc = connector->state->crtc;
+   if (connector->status != connector_status_connected || !crtc) {
+   ret = -ENODEV;
+   goto out;
+   }
+
+   intel_dp = intel_attached_dp(to_intel_connector(connector));
+   seq_printf(m, "Force_DSC_Fractional_BPP_Enable: %s\n",
+  str_yes_no(intel_dp->force_dsc_fractional_bpp_en));
+
+out:
+   drm_modeset_unlock(&dev->mode_config.connection_mutex);
+
+   return ret;
+}
+
+static ssize_t i915_dsc_fractional_bpp_write(struct file *file,
+const char __user *ubuf,
+size_t len, loff_t *offp)
+{
+   struct drm_connector *connector =
+   ((struct seq_file *)file->private_data)->private;
+   struct intel_encoder *encoder = 
intel_attached_encoder(to_intel_connector(connector));
+   struct drm_i915_private *i915 = to_i915(encoder->base.dev);
+   struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
+   bool dsc_fractional_bpp_enable = false;
+   int ret;
+
+   if (len == 0)
+   return 0;
+
+   drm_dbg(&i915->drm,
+   "Copied %zu bytes from user to force fractional bpp for DSC\n", 
len);
+
+   ret = kstrtobool_from_user(ubuf, len, &dsc_fractional_bpp_enable);
+   if (ret < 0)
+   return ret;
+
+   drm_dbg(&i915->drm, "Got %s for DSC Fractional BPP Enable\n",
+   (dsc_fractional_bpp_enable) ? "true" : "false");
+   intel_dp->force_dsc_fractional_bpp_en = dsc_fractional_bpp_enable;
+
+   *offp += len;
+
+   return len;
+}
+
+static int i915_dsc_fractional_bpp_open(struct inode *inode,
+   struct file *file)
+{
+   return single_open(file, i915_dsc_fractional_bpp_show, 
inode->i_private);
+}
+
+static const struct file_operations i915_dsc_fractional_bpp_fops = {
+   .owner = THIS_MODULE,
+   .open = i915_dsc_fractional_bpp_open,
+   .read = seq_read,
+   .llseek = seq_lseek,
+   .release = single_release,
+   .write = i915_dsc_fractional_bpp_write
+};
+
 /*
  * Returns the Current CRTC's bpc.
  * Example usage: cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc
@@ -1513,6 +1593,9 @@ void intel_connector_debugfs_add(struct intel_connector 
*intel_connector)
 
debugfs_create_file("i915_dsc_output_format", 0644, root,
connector, &i915_dsc_output_format_fops);
+
+   debugfs_create_file("i915_dsc_fractional_bpp", 0644, root,
+  

[PATCH 5/8] drm/i915/dsc/mtl: Add support for fractional bpp

2023-09-12 Thread Mitul Golani
From: Vandita Kulkarni 

Consider the fractional bpp while reading the qp values.

v2: Use helpers for fractional, integral bits of bits_per_pixel. (Suraj)

Signed-off-by: Vandita Kulkarni 
Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 .../gpu/drm/i915/display/intel_qp_tables.c|  3 ---
 drivers/gpu/drm/i915/display/intel_vdsc.c | 25 +++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_qp_tables.c 
b/drivers/gpu/drm/i915/display/intel_qp_tables.c
index 543cdc46aa1d..600c815e37e4 100644
--- a/drivers/gpu/drm/i915/display/intel_qp_tables.c
+++ b/drivers/gpu/drm/i915/display/intel_qp_tables.c
@@ -34,9 +34,6 @@
  * These qp tables are as per the C model
  * and it has the rows pointing to bpps which increment
  * in steps of 0.5
- * We do not support fractional bpps as of today,
- * hence we would skip the fractional bpps during
- * our references for qp calclulations.
  */
 static const u8 
rc_range_minqp444_8bpc[DSC_NUM_BUF_RANGES][RC_RANGE_QP444_8BPC_MAX_NUM_BPP] = {
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
diff --git a/drivers/gpu/drm/i915/display/intel_vdsc.c 
b/drivers/gpu/drm/i915/display/intel_vdsc.c
index 1bd9391a6f5a..2c19078fbce8 100644
--- a/drivers/gpu/drm/i915/display/intel_vdsc.c
+++ b/drivers/gpu/drm/i915/display/intel_vdsc.c
@@ -78,8 +78,8 @@ intel_vdsc_set_min_max_qp(struct drm_dsc_config *vdsc_cfg, 
int buf,
 static void
 calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
 {
+   int bpp = intel_fractional_bpp_from_x16(vdsc_cfg->bits_per_pixel);
int bpc = vdsc_cfg->bits_per_component;
-   int bpp = vdsc_cfg->bits_per_pixel >> 4;
int qp_bpc_modifier = (bpc - 8) * 2;
int uncompressed_bpg_rate;
int first_line_bpg_offset;
@@ -149,7 +149,13 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
static const s8 ofs_und8[] = {
10, 8, 6, 4, 2, 0, -2, -4, -6, -8, -10, -10, -12, -12, 
-12
};
-
+   /*
+* For 420 format since bits_per_pixel (bpp) is set to target 
bpp * 2,
+* QP table values for target bpp 4.0 to 4.4375 (rounded to 
4.0) are
+* actually for bpp 8 to 8.875 (rounded to 4.0 * 2 i.e 8).
+* Similarly values for target bpp 4.5 to 4.8375 (rounded to 
4.5)
+* are for bpp 9 to 9.875 (rounded to 4.5 * 2 i.e 9), and so on.
+*/
bpp_i  = bpp - 8;
for (buf_i = 0; buf_i < DSC_NUM_BUF_RANGES; buf_i++) {
u8 range_bpg_offset;
@@ -179,6 +185,9 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
range_bpg_offset & DSC_RANGE_BPG_OFFSET_MASK;
}
} else {
+   /* fractional bpp part * 1 (for precision up to 4 decimal 
places) */
+   int fractional_bits = 
intel_fractional_bpp_decimal(vdsc_cfg->bits_per_pixel);
+
static const s8 ofs_und6[] = {
0, -2, -2, -4, -6, -6, -8, -8, -8, -10, -10, -12, -12, 
-12, -12
};
@@ -192,7 +201,14 @@ calculate_rc_params(struct drm_dsc_config *vdsc_cfg)
10, 8, 6, 4, 2, 0, -2, -4, -6, -8, -10, -10, -12, -12, 
-12
};
 
-   bpp_i  = (2 * (bpp - 6));
+   /*
+* QP table rows have values in increment of 0.5.
+* So 6.0 bpp to 6.4375 will have index 0, 6.5 to 6.9375 will 
have index 1,
+* and so on.
+* 0.5 fractional part with 4 decimal precision becomes 5000
+*/
+   bpp_i  = ((bpp - 6) + (fractional_bits < 5000 ? 0 : 1));
+
for (buf_i = 0; buf_i < DSC_NUM_BUF_RANGES; buf_i++) {
u8 range_bpg_offset;
 
@@ -280,8 +296,7 @@ int intel_dsc_compute_params(struct intel_crtc_state 
*pipe_config)
/* Gen 11 does not support VBR */
vdsc_cfg->vbr_enable = false;
 
-   /* Gen 11 only supports integral values of bpp */
-   vdsc_cfg->bits_per_pixel = compressed_bpp << 4;
+   vdsc_cfg->bits_per_pixel = pipe_config->dsc.compressed_bpp_x16;
 
/*
 * According to DSC 1.2 specs in Section 4.1 if native_420 is set
-- 
2.25.1



[PATCH 8/8] drm/i915/dsc: Allow DSC only with fractional bpp when forced from debugfs

2023-09-12 Thread Mitul Golani
From: Swati Sharma 

If force_dsc_fractional_bpp_en is set through debugfs allow DSC iff
compressed bpp is fractional. Continue if the computed compressed bpp
turns out to be a integer.

v2:
-Use helpers for fractional, integral bits of bits_per_pixel. (Suraj)
-Fix comment (Suraj)

Signed-off-by: Swati Sharma 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 25d033622439..477d5f061633 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1905,6 +1905,9 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
for (compressed_bppx16 = dsc_max_bpp;
 compressed_bppx16 >= dsc_min_bpp;
 compressed_bppx16 -= bppx16_step) {
+   if (intel_dp->force_dsc_fractional_bpp_en &&
+   !intel_fractional_bpp_decimal(compressed_bppx16))
+   continue;
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
@@ -1912,6 +1915,10 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
  timeslots);
if (ret == 0) {
pipe_config->dsc.compressed_bpp_x16 = compressed_bppx16;
+   if (intel_dp->force_dsc_fractional_bpp_en &&
+   intel_fractional_bpp_decimal(compressed_bppx16))
+   drm_dbg_kms(&i915->drm, "Forcing DSC fractional 
bpp\n");
+
return 0;
}
}
-- 
2.25.1



[PATCH 6/8] drm/i915/dp: Iterate over output bpp with fractional step size

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

This patch adds support to iterate over compressed output bpp as per the
fractional step, supported by DP sink.

v2:
-Avoid ending up with compressed bpp, same as pipe bpp. (Stan)

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_dp.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index a359a8d65dbd..25d033622439 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1716,15 +1716,15 @@ static bool intel_dp_dsc_supports_format(struct 
intel_dp *intel_dp,
return drm_dp_dsc_sink_supports_format(intel_dp->dsc_dpcd, 
sink_dsc_format);
 }
 
-static bool is_bw_sufficient_for_dsc_config(u16 compressed_bpp, u32 link_clock,
+static bool is_bw_sufficient_for_dsc_config(u16 compressed_bppx16, u32 
link_clock,
u32 lane_count, u32 mode_clock,
enum intel_output_format 
output_format,
int timeslots)
 {
u32 available_bw, required_bw;
 
-   available_bw = (link_clock * lane_count * timeslots)  / 8;
-   required_bw = compressed_bpp * (intel_dp_mode_to_fec_clock(mode_clock));
+   available_bw = (link_clock * lane_count * timeslots * 16)  / 8;
+   required_bw = compressed_bppx16 * 
(intel_dp_mode_to_fec_clock(mode_clock));
 
return available_bw > required_bw;
 }
@@ -1732,7 +1732,7 @@ static bool is_bw_sufficient_for_dsc_config(u16 
compressed_bpp, u32 link_clock,
 static int dsc_compute_link_config(struct intel_dp *intel_dp,
   struct intel_crtc_state *pipe_config,
   struct link_config_limits *limits,
-  u16 compressed_bpp,
+  u16 compressed_bppx16,
   int timeslots)
 {
const struct drm_display_mode *adjusted_mode = 
&pipe_config->hw.adjusted_mode;
@@ -1747,8 +1747,8 @@ static int dsc_compute_link_config(struct intel_dp 
*intel_dp,
for (lane_count = limits->min_lane_count;
 lane_count <= limits->max_lane_count;
 lane_count <<= 1) {
-   if (!is_bw_sufficient_for_dsc_config(compressed_bpp, 
link_rate, lane_count,
-
adjusted_mode->clock,
+   if (!is_bw_sufficient_for_dsc_config(compressed_bppx16, 
link_rate,
+lane_count, 
adjusted_mode->clock,
 
pipe_config->output_format,
 timeslots))
continue;
@@ -1861,7 +1861,7 @@ icl_dsc_compute_link_config(struct intel_dp *intel_dp,
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
- valid_dsc_bpp[i],
+ valid_dsc_bpp[i] << 4,
  timeslots);
if (ret == 0) {
pipe_config->dsc.compressed_bpp_x16 = 
intel_fractional_bpp_to_x16(valid_dsc_bpp[i]);
@@ -1887,22 +1887,31 @@ xelpd_dsc_compute_link_config(struct intel_dp *intel_dp,
  int pipe_bpp,
  int timeslots)
 {
-   u16 compressed_bpp;
+   u8 bppx16_incr = drm_dp_dsc_sink_bpp_incr(intel_dp->dsc_dpcd);
+   struct drm_i915_private *i915 = dp_to_i915(intel_dp);
+   u16 compressed_bppx16;
+   u8 bppx16_step;
int ret;
 
+   if (DISPLAY_VER(i915) < 14 || bppx16_incr <= 1)
+   bppx16_step = 16;
+   else
+   bppx16_step = 16 / bppx16_incr;
+
/* Compressed BPP should be less than the Input DSC bpp */
-   dsc_max_bpp = min(dsc_max_bpp, pipe_bpp - 1);
+   dsc_max_bpp = min(dsc_max_bpp << 4, (pipe_bpp << 4) - bppx16_step);
+   dsc_min_bpp = dsc_min_bpp << 4;
 
-   for (compressed_bpp = dsc_max_bpp;
-compressed_bpp >= dsc_min_bpp;
-compressed_bpp--) {
+   for (compressed_bppx16 = dsc_max_bpp;
+compressed_bppx16 >= dsc_min_bpp;
+compressed_bppx16 -= bppx16_step) {
ret = dsc_compute_link_config(intel_dp,
  pipe_config,
  limits,
- compressed_bpp,
+ compressed_bppx16,
  timeslots);
if (ret == 0) {
- 

[PATCH 2/8] drm/i915/display: Store compressed bpp in U6.4 format

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

DSC parameter bits_per_pixel is stored in U6.4 format.
The 4 bits represent the fractional part of the bpp.
Currently we use compressed_bpp member of dsc structure to store
only the integral part of the bits_per_pixel.
To store the full bits_per_pixel along with the fractional part,
compressed_bpp is changed to store bpp in U6.4 formats. Intergral
part is retrieved by simply right shifting the member compressed_bpp by 4.

v2:
-Use to_bpp_int, to_bpp_frac_dec, to_bpp_x16 helpers while dealing
 with compressed bpp. (Suraj)
-Fix comment styling. (Suraj)

v3:
-Add separate file for 6.4 fixed point helper(Jani, Nikula)
-Add comment for magic values(Suraj)

Signed-off-by: Ankit Nautiyal 
Signed-off-by: Mitul Golani 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/icl_dsi.c| 11 +++---
 drivers/gpu/drm/i915/display/intel_audio.c|  3 +-
 drivers/gpu/drm/i915/display/intel_bios.c |  6 ++--
 drivers/gpu/drm/i915/display/intel_cdclk.c|  5 +--
 drivers/gpu/drm/i915/display/intel_display.c  |  2 +-
 .../drm/i915/display/intel_display_types.h|  3 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 29 ---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   | 22 ++--
 .../i915/display/intel_fractional_helper.h| 36 +++
 drivers/gpu/drm/i915/display/intel_vdsc.c |  5 +--
 10 files changed, 85 insertions(+), 37 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/display/intel_fractional_helper.h

diff --git a/drivers/gpu/drm/i915/display/icl_dsi.c 
b/drivers/gpu/drm/i915/display/icl_dsi.c
index ad6488e9c2b2..0f7594b6aa1f 100644
--- a/drivers/gpu/drm/i915/display/icl_dsi.c
+++ b/drivers/gpu/drm/i915/display/icl_dsi.c
@@ -43,6 +43,7 @@
 #include "intel_de.h"
 #include "intel_dsi.h"
 #include "intel_dsi_vbt.h"
+#include "intel_fractional_helper.h"
 #include "intel_panel.h"
 #include "intel_vdsc.h"
 #include "intel_vdsc_regs.h"
@@ -330,7 +331,7 @@ static int afe_clk(struct intel_encoder *encoder,
int bpp;
 
if (crtc_state->dsc.compression_enable)
-   bpp = crtc_state->dsc.compressed_bpp;
+   bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
else
bpp = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
@@ -860,7 +861,7 @@ gen11_dsi_set_transcoder_timings(struct intel_encoder 
*encoder,
 * compressed and non-compressed bpp.
 */
if (crtc_state->dsc.compression_enable) {
-   mul = crtc_state->dsc.compressed_bpp;
+   mul = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
div = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
}
 
@@ -884,7 +885,7 @@ gen11_dsi_set_transcoder_timings(struct intel_encoder 
*encoder,
int bpp, line_time_us, byte_clk_period_ns;
 
if (crtc_state->dsc.compression_enable)
-   bpp = crtc_state->dsc.compressed_bpp;
+   bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
else
bpp = 
mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
@@ -1451,8 +1452,8 @@ static void gen11_dsi_get_timings(struct intel_encoder 
*encoder,
struct drm_display_mode *adjusted_mode =
&pipe_config->hw.adjusted_mode;
 
-   if (pipe_config->dsc.compressed_bpp) {
-   int div = pipe_config->dsc.compressed_bpp;
+   if (pipe_config->dsc.compressed_bpp_x16) {
+   int div = 
intel_fractional_bpp_from_x16(pipe_config->dsc.compressed_bpp_x16);
int mul = mipi_dsi_pixel_format_to_bpp(intel_dsi->pixel_format);
 
adjusted_mode->crtc_htotal =
diff --git a/drivers/gpu/drm/i915/display/intel_audio.c 
b/drivers/gpu/drm/i915/display/intel_audio.c
index 19605264a35c..4f1db1581316 100644
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@@ -35,6 +35,7 @@
 #include "intel_crtc.h"
 #include "intel_de.h"
 #include "intel_display_types.h"
+#include "intel_fractional_helper.h"
 #include "intel_lpe_audio.h"
 
 /**
@@ -528,7 +529,7 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
h_active = crtc_state->hw.adjusted_mode.crtc_hdisplay;
h_total = crtc_state->hw.adjusted_mode.crtc_htotal;
pixel_clk = crtc_state->hw.adjusted_mode.crtc_clock;
-   vdsc_bpp = crtc_state->dsc.compressed_bpp;
+   vdsc_bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
cdclk = i915->display.cdclk.hw.cdclk;
/* fec= 0.972261, using rounding multiplier of 100 */
fec_coeff = 972261;
diff --git a/drivers/gpu/drm/i915/display/intel_bios.c 
b/drivers/gpu/drm/i915/display/intel_bios.c
index f735b035436c..df3e80a0cd01 100644
--- a/drivers/gpu/drm/i915/display/intel_bios.c
+++ b/drivers/gpu/drm/i915/display/intel

[PATCH 4/8] drm/i915/audio : Consider fractional vdsc bpp while computing tu_data

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

MTL+ supports fractional compressed bits_per_pixel, with precision of
1/16. This compressed bpp is stored in U6.4 format.
Accommodate the precision during calculation of transfer unit data
for hblank_early calculation.

v2:
-Fixed tu_data calculation while dealing with U6.4 format. (Stan)

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_audio.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_audio.c 
b/drivers/gpu/drm/i915/display/intel_audio.c
index 4f1db1581316..3b08be54ce4f 100644
--- a/drivers/gpu/drm/i915/display/intel_audio.c
+++ b/drivers/gpu/drm/i915/display/intel_audio.c
@@ -522,25 +522,25 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
unsigned int link_clks_available, link_clks_required;
unsigned int tu_data, tu_line, link_clks_active;
unsigned int h_active, h_total, hblank_delta, pixel_clk;
-   unsigned int fec_coeff, cdclk, vdsc_bpp;
+   unsigned int fec_coeff, cdclk, vdsc_bppx16;
unsigned int link_clk, lanes;
unsigned int hblank_rise;
 
h_active = crtc_state->hw.adjusted_mode.crtc_hdisplay;
h_total = crtc_state->hw.adjusted_mode.crtc_htotal;
pixel_clk = crtc_state->hw.adjusted_mode.crtc_clock;
-   vdsc_bpp = 
intel_fractional_bpp_from_x16(crtc_state->dsc.compressed_bpp_x16);
+   vdsc_bppx16 = crtc_state->dsc.compressed_bpp_x16;
cdclk = i915->display.cdclk.hw.cdclk;
/* fec= 0.972261, using rounding multiplier of 100 */
fec_coeff = 972261;
link_clk = crtc_state->port_clock;
lanes = crtc_state->lane_count;
 
-   drm_dbg_kms(&i915->drm, "h_active = %u link_clk = %u :"
-   "lanes = %u vdsc_bpp = %u cdclk = %u\n",
-   h_active, link_clk, lanes, vdsc_bpp, cdclk);
+   drm_dbg_kms(&i915->drm,
+   "h_active = %u link_clk = %u : lanes = %u vdsc_bppx16 = %u 
cdclk = %u\n",
+   h_active, link_clk, lanes, vdsc_bppx16, cdclk);
 
-   if (WARN_ON(!link_clk || !pixel_clk || !lanes || !vdsc_bpp || !cdclk))
+   if (WARN_ON(!link_clk || !pixel_clk || !lanes || !vdsc_bppx16 || 
!cdclk))
return 0;
 
link_clks_available = (h_total - h_active) * link_clk / pixel_clk - 28;
@@ -552,8 +552,8 @@ static unsigned int calc_hblank_early_prog(struct 
intel_encoder *encoder,
hblank_delta = DIV64_U64_ROUND_UP(mul_u32_u32(5 * (link_clk + 
cdclk), pixel_clk),
  mul_u32_u32(link_clk, cdclk));
 
-   tu_data = div64_u64(mul_u32_u32(pixel_clk * vdsc_bpp * 8, 100),
-   mul_u32_u32(link_clk * lanes, fec_coeff));
+   tu_data = div64_u64(mul_u32_u32(pixel_clk * vdsc_bppx16 * 8, 100),
+   mul_u32_u32(link_clk * lanes * 16, fec_coeff));
tu_line = div64_u64(h_active * mul_u32_u32(link_clk, fec_coeff),
mul_u32_u32(64 * pixel_clk, 100));
link_clks_active  = (tu_line - 1) * 64 + tu_data;
-- 
2.25.1



[PATCH 3/8] drm/i915/display: Consider fractional vdsc bpp while computing m_n values

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

MTL+ supports fractional compressed bits_per_pixel, with precision of
1/16. This compressed bpp is stored in U6.4 format.
Accommodate this precision while computing m_n values.

Signed-off-by: Ankit Nautiyal 
Reviewed-by: Suraj Kandpal 
---
 drivers/gpu/drm/i915/display/intel_display.c | 6 +-
 drivers/gpu/drm/i915/display/intel_display.h | 2 +-
 drivers/gpu/drm/i915/display/intel_dp.c  | 5 +++--
 drivers/gpu/drm/i915/display/intel_dp_mst.c  | 6 --
 drivers/gpu/drm/i915/display/intel_fdi.c | 2 +-
 5 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index afcbdd4f105a..b37aeac961f4 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -2380,10 +2380,14 @@ void
 intel_link_compute_m_n(u16 bits_per_pixel, int nlanes,
   int pixel_clock, int link_clock,
   struct intel_link_m_n *m_n,
-  bool fec_enable)
+  bool fec_enable,
+  bool is_dsc_fractional_bpp)
 {
u32 data_clock = bits_per_pixel * pixel_clock;
 
+   if (is_dsc_fractional_bpp)
+   data_clock = DIV_ROUND_UP(bits_per_pixel * pixel_clock, 16);
+
if (fec_enable)
data_clock = intel_dp_mode_to_fec_clock(data_clock);
 
diff --git a/drivers/gpu/drm/i915/display/intel_display.h 
b/drivers/gpu/drm/i915/display/intel_display.h
index 49ac8473b988..a4c4ca3cad65 100644
--- a/drivers/gpu/drm/i915/display/intel_display.h
+++ b/drivers/gpu/drm/i915/display/intel_display.h
@@ -398,7 +398,7 @@ u8 intel_calc_active_pipes(struct intel_atomic_state *state,
 void intel_link_compute_m_n(u16 bpp, int nlanes,
int pixel_clock, int link_clock,
struct intel_link_m_n *m_n,
-   bool fec_enable);
+   bool fec_enable, bool is_dsc_fractional_bpp);
 u32 intel_plane_fb_max_stride(struct drm_i915_private *dev_priv,
  u32 pixel_format, u64 modifier);
 enum drm_mode_status
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c 
b/drivers/gpu/drm/i915/display/intel_dp.c
index 1891e3ead174..a359a8d65dbd 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -2558,7 +2558,7 @@ intel_dp_drrs_compute_config(struct intel_connector 
*connector,
 
intel_link_compute_m_n(link_bpp, pipe_config->lane_count, pixel_clock,
   pipe_config->port_clock, &pipe_config->dp_m2_n2,
-  pipe_config->fec_enable);
+  pipe_config->fec_enable, false);
 
/* FIXME: abstract this better */
if (pipe_config->splitter.enable)
@@ -2737,7 +2737,8 @@ intel_dp_compute_config(struct intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   pipe_config->port_clock,
   &pipe_config->dp_m_n,
-  pipe_config->fec_enable);
+  pipe_config->fec_enable,
+  pipe_config->dsc.compression_enable);
 
/* FIXME: abstract this better */
if (pipe_config->splitter.enable)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c 
b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 350c561775d4..bdc6955e517b 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -172,7 +172,8 @@ static int intel_dp_mst_compute_link_config(struct 
intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   crtc_state->port_clock,
   &crtc_state->dp_m_n,
-  crtc_state->fec_enable);
+  crtc_state->fec_enable,
+  false);
crtc_state->dp_m_n.tu = slots;
 
return 0;
@@ -267,7 +268,8 @@ static int intel_dp_dsc_mst_compute_link_config(struct 
intel_encoder *encoder,
   adjusted_mode->crtc_clock,
   crtc_state->port_clock,
   &crtc_state->dp_m_n,
-  crtc_state->fec_enable);
+  crtc_state->fec_enable,
+  crtc_state->dsc.compression_enable);
crtc_state->dp_m_n.tu = slots;
 
return 0;
diff --git a/drivers/gpu/drm/i915/display/intel_fdi.c 
b/drivers/gpu/drm/i915/display/intel_fdi.c
index e12b46a84fa1..15fddabf7c2e 100644
--- a/drivers/gpu/drm/i915/display/intel_fdi.c
+++ b/drivers/gpu/drm/i915/display/intel_fdi.c
@@ -259,7 +259,7 @@ int ilk_fdi_compute_config(struct intel_crtc *crtc,
pipe_config->fdi_lanes = lane;
 
intel_link_compute_m_n(pipe_config-

[PATCH 1/8] drm/display/dp: Add helper function to get DSC bpp prescision

2023-09-12 Thread Mitul Golani
From: Ankit Nautiyal 

Add helper to get the DSC bits_per_pixel precision for the DP sink.

Signed-off-by: Ankit Nautiyal 
---
 drivers/gpu/drm/display/drm_dp_helper.c | 27 +
 include/drm/display/drm_dp_helper.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 8a1b64c57dfd..5c23d5b8fc50 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -2323,6 +2323,33 @@ int drm_dp_read_desc(struct drm_dp_aux *aux, struct 
drm_dp_desc *desc,
 }
 EXPORT_SYMBOL(drm_dp_read_desc);
 
+/**
+ * drm_dp_dsc_sink_bpp_incr() - Get bits per pixel increment
+ * @dsc_dpcd: DSC capabilities from DPCD
+ *
+ * Returns the bpp precision supported by the DP sink.
+ */
+u8 drm_dp_dsc_sink_bpp_incr(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE])
+{
+   u8 bpp_increment_dpcd = dsc_dpcd[DP_DSC_BITS_PER_PIXEL_INC - 
DP_DSC_SUPPORT];
+
+   switch (bpp_increment_dpcd) {
+   case DP_DSC_BITS_PER_PIXEL_1_16:
+   return 16;
+   case DP_DSC_BITS_PER_PIXEL_1_8:
+   return 8;
+   case DP_DSC_BITS_PER_PIXEL_1_4:
+   return 4;
+   case DP_DSC_BITS_PER_PIXEL_1_2:
+   return 2;
+   case DP_DSC_BITS_PER_PIXEL_1_1:
+   return 1;
+   }
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_dp_dsc_sink_bpp_incr);
+
 /**
  * drm_dp_dsc_sink_max_slice_count() - Get the max slice count
  * supported by the DSC sink.
diff --git a/include/drm/display/drm_dp_helper.h 
b/include/drm/display/drm_dp_helper.h
index 3369104e2d25..6968d4d87931 100644
--- a/include/drm/display/drm_dp_helper.h
+++ b/include/drm/display/drm_dp_helper.h
@@ -164,6 +164,7 @@ drm_dp_is_branch(const u8 dpcd[DP_RECEIVER_CAP_SIZE])
 }
 
 /* DP/eDP DSC support */
+u8 drm_dp_dsc_sink_bpp_incr(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
 u8 drm_dp_dsc_sink_max_slice_count(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE],
   bool is_edp);
 u8 drm_dp_dsc_sink_line_buf_depth(const u8 dsc_dpcd[DP_DSC_RECEIVER_CAP_SIZE]);
-- 
2.25.1



[PATCH 0/8] Add DSC fractional bpp support

2023-09-12 Thread Mitul Golani
This patch series adds support for DSC fractional compressed bpp
for MTL+. The series starts with some fixes, followed by patches that
lay groundwork to iterate over valid compressed bpps to select the
'best' compressed bpp with optimal link configuration (taken from
upstream series: https://patchwork.freedesktop.org/series/105200/).

The later patches, add changes to accommodate compressed bpp with
fractional part, including changes to QP calculations.
To get the 'best' compressed bpp, we iterate over the valid compressed
bpp values, but with fractional step size 1/16, 1/8, 1/4 or 1/2 as per
sink support.

The last 2 patches add support to depict DSC sink's fractional support,
and debugfs to enforce use of fractional bpp, while choosing an
appropriate compressed bpp.

Ankit Nautiyal (5):
  drm/display/dp: Add helper function to get DSC bpp prescision
  drm/i915/display: Store compressed bpp in U6.4 format
  drm/i915/display: Consider fractional vdsc bpp while computing m_n
values
  drm/i915/audio : Consider fractional vdsc bpp while computing tu_data
  drm/i915/dp: Iterate over output bpp with fractional step size

Swati Sharma (2):
  drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp
  drm/i915/dsc: Allow DSC only with fractional bpp when forced from
debugfs

Vandita Kulkarni (1):
  drm/i915/dsc/mtl: Add support for fractional bpp

Ankit Nautiyal (5):
  drm/display/dp: Add helper function to get DSC bpp prescision
  drm/i915/display: Store compressed bpp in U6.4 format
  drm/i915/display: Consider fractional vdsc bpp while computing m_n
values
  drm/i915/audio : Consider fractional vdsc bpp while computing tu_data
  drm/i915/dp: Iterate over output bpp with fractional step size

Swati Sharma (2):
  drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp
  drm/i915/dsc: Allow DSC only with fractional bpp when forced from
debugfs

Vandita Kulkarni (1):
  drm/i915/dsc/mtl: Add support for fractional bpp

Ankit Nautiyal (5):
  drm/display/dp: Add helper function to get DSC bpp prescision
  drm/i915/display: Store compressed bpp in U6.4 format
  drm/i915/display: Consider fractional vdsc bpp while computing m_n
values
  drm/i915/audio : Consider fractional vdsc bpp while computing tu_data
  drm/i915/dp: Iterate over output bpp with fractional step size

Swati Sharma (2):
  drm/i915/dsc: Add debugfs entry to validate DSC fractional bpp
  drm/i915/dsc: Allow DSC only with fractional bpp when forced from
debugfs

Vandita Kulkarni (1):
  drm/i915/dsc/mtl: Add support for fractional bpp

 drivers/gpu/drm/display/drm_dp_helper.c   | 27 ++
 drivers/gpu/drm/i915/display/icl_dsi.c| 11 +--
 drivers/gpu/drm/i915/display/intel_audio.c| 17 ++--
 drivers/gpu/drm/i915/display/intel_bios.c |  6 +-
 drivers/gpu/drm/i915/display/intel_cdclk.c|  5 +-
 drivers/gpu/drm/i915/display/intel_display.c  |  8 +-
 drivers/gpu/drm/i915/display/intel_display.h  |  2 +-
 .../drm/i915/display/intel_display_debugfs.c  | 83 +++
 .../drm/i915/display/intel_display_types.h|  4 +-
 drivers/gpu/drm/i915/display/intel_dp.c   | 78 ++---
 drivers/gpu/drm/i915/display/intel_dp_mst.c   | 28 ---
 drivers/gpu/drm/i915/display/intel_fdi.c  |  2 +-
 .../i915/display/intel_fractional_helper.h| 36 
 .../gpu/drm/i915/display/intel_qp_tables.c|  3 -
 drivers/gpu/drm/i915/display/intel_vdsc.c | 30 +--
 include/drm/display/drm_dp_helper.h   |  1 +
 16 files changed, 268 insertions(+), 73 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/display/intel_fractional_helper.h

-- 
2.25.1



Re: [Freedreno] [PATCH] drm/msm/dp: skip validity check for DP CTS EDID checksum

2023-09-12 Thread Abhinav Kumar

Hi Jani

On 9/12/2023 5:16 AM, Jani Nikula wrote:

On Thu, 07 Sep 2023, Stephen Boyd  wrote:

Quoting Jani Nikula (2023-09-01 07:20:34)

The DP CTS test for EDID last block checksum expects the checksum for
the last block, invalid or not. Skip the validity check.

For the most part (*), the EDIDs returned by drm_get_edid() will be
valid anyway, and there's the CTS workaround to get the checksum for
completely invalid EDIDs. See commit 7948fe12d47a ("drm/msm/dp: return
correct edid checksum after corrupted edid checksum read").

This lets us remove one user of drm_edid_block_valid() with hopes the
function can be removed altogether in the future.

(*) drm_get_edid() ignores checksum errors on CTA extensions.

Cc: Abhinav Kumar 
Cc: Dmitry Baryshkov 
Cc: Kuogee Hsieh 
Cc: Marijn Suijten 
Cc: Rob Clark 
Cc: Sean Paul 
Cc: Stephen Boyd 
Cc: linux-arm-...@vger.kernel.org
Cc: freedr...@lists.freedesktop.org
Signed-off-by: Jani Nikula 
---


Reviewed-by: Stephen Boyd 


Thanks; is that enough to merge? I can't claim I would have been able to
test this.



Reviewed-by: Abhinav Kumar 

Change looks fine.

We can pick this up in the MSM tree if you would like.

Dmitry, you can please pick this up along with my R-b and Kuogee's R-b 
as well.


I think his R-b got misformatted. I can ask him to add that again.





diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
b/drivers/gpu/drm/msm/dp/dp_panel.c
index 42d52510ffd4..86a8e06c7a60 100644
--- a/drivers/gpu/drm/msm/dp/dp_panel.c
+++ b/drivers/gpu/drm/msm/dp/dp_panel.c
@@ -289,26 +289,9 @@ int dp_panel_get_modes(struct dp_panel *dp_panel,

  static u8 dp_panel_get_edid_checksum(struct edid *edid)


It would be nice to make 'edid' const here in another patch.


Sure.

BR,
Jani.




Re: [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation

2023-09-12 Thread Thomas Hellström

Hi, Danilo,

On 9/9/23 17:31, Danilo Krummrich wrote:

So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Big kudos to Boris Brezillon for his help to figure out locking for drivers
updating the GPU VA space within the fence signalling path.

Suggested-by: Matthew Brost 
Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/drm_gpuvm.c | 516 
  include/drm/drm_gpuvm.h | 197 ++
  2 files changed, 713 insertions(+)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index f4411047dbb3..8e62a043f719 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -73,6 +73,21 @@
   * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this
   * particular combination. If not existent a new instance is created and 
linked
   * to the &drm_gem_object.
+ *
+ * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used
+ * as entry for the &drm_gpuvm's lists of external and evicted objects. Those
+ * list are maintained in order to accelerate locking of dma-resv locks and
+ * validation of evicted objects bound in a &drm_gpuvm. For instance the all
+ * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling
+ * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in
+ * order to validate all evicted &drm_gem_objects. It is also possible to lock
+ * additional &drm_gem_objects by providing the corresponding parameters to
+ * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making
+ * use of helper functions such as drm_gpuvm_prepare_range() or
+ * drm_gpuvm_prepare_objects().
+ *
+ * Every bound &drm_gem_object is treated as external object when its &dma_resv
+ * structure is different than the &drm_gpuvm's common &dma_resv structure.
   */
  
  /**

@@ -420,6 +435,20 @@
   * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and
   * &drm_gem_object must be able to observe previous creations and destructions
   * of &drm_gpuvm_bos in order to keep instances unique.
+ *
+ * The &drm_gpuvm's lists for keeping track of external and evicted objects are
+ * protected against concurrent insertion / removal and iteration internally.
+ *
+ * However, drivers still need ensure to protect concurrent calls to functions
+ * iterating those lists, such as drm_gpuvm_validate() and
+ * drm_gpuvm_prepare_objects(). Every such function contains a particular
+ * comment and lockdep checks if possible.
+ *
+ * Functions adding or removing entries from those lists, such as
+ * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with 
external
+ * locks being held, e.g. in order to avoid the corresponding list to be
+ * (safely) modified while potentially being iternated by other API functions.
+ * However, this is entirely optional.
   */
  
  /**

@@ -632,6 +661,131 @@
   *}
   */
  
+/**

+ * get_next_vm_bo_from_list() - get the next vm_bo element
+ * @__gpuvm: The GPU VM
+ * @__list_name: The name of the list we're iterating on
+ * @__local_list: A pointer to the local list used to store already iterated 
items
+ * @__prev_vm_bo: The previous element we got from 
drm_gpuvm_get_next_cached_vm_bo()
+ *
+ * This helper is here to provide lockless list iteration. Lockless as in, the
+ * iterator releases the lock immediately after picking the first element from
+ * the list, so list insertion deletion can happen concurrently.


Are the list spinlocks needed for that async state update from within 
the dma-fence critical section we've discussed previously?


Otherwise it should be sufficient to protect the lists with the gpuvm's 
resv (or for the extobj list with an outer lock).


If those sp

Re: [v3] drm/i915/display/lspcon: Increase LSPCON mode settle timeout

2023-09-12 Thread Joshua Pius
Yes, we've proposed this change before. The reasoning is still the
same. Added below to include in this thread as well. Is there a reason
the below explanation and test is not sufficient?

This issue affected several different CometLake-based Chrome OS device
designs. The details of the original report are in the Google partner
issue tracker (issue # 178169843), but I believe this requires a
Google partner account to access:
https://partnerissuetracker.corp.google.com/issues/178169843

The summary is that we were seeing these "*ERROR* LSPCON mode hasn't
settled" messages in the kernel logs followed by the display not
working at all. We increased the timeout to 500ms while investigation
continued and this reduced the number of occurrences of this issue:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7b2899fc1a6f9409e8075b3153baaf02c4d1fc75

The problem continued to occur on about 2% of devices even after
increasing the timeout to 500ms. The investigation continued in issue
# 188035814, with engineers from Parade and Intel involved.
Ultimately, the recommendation from Intel engineers was to increase
the timeout further:
https://partnerissuetracker.corp.google.com/issues/188035814

The timeout was then increased to 1000ms:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a16cfc2062e768c8e5ad8fa09b8ca127aa1ead9a

I recently ran 100 reboot trials on one device and found that the
median time for the LSPCON mode to settle was 440ms and the max was
444ms. But we know from the original reports that even after we set
the timeout to 500ms the issue continued to happen on some small
percentage of devices. So this is why I picked the larger value of
800ms.

>> This is to eliminate all cases of "*ERROR* LSPCON mode hasn't settled",
>> followed by link training errors. Intel engineers recommended increasing
>> this timeout and that does resolve the issue.
>>
>> On some CometLake-based device designs the Parade PS175 takes more than
>> 400ms to settle in PCON mode. 100 reboot trials on one device resulted
>> in a median settle time of 440ms and a maximum of 444ms. Even after
>> increasing the timeout to 500ms, 2% of devices still had this error. So
>> this increases the timeout to 800ms.
>>
>> Signed-off-by: Pablo Ceballos 
>
>I think we've been here before. Do you have a publicly available gitlab
>issue with the proper logs? If not, please file one at [1].
>
>BR,
>Jani.
>
>[1] https://gitlab.freedesktop.org/drm/intel/issues/new
>
>
>> ---
>>
>> V2: Added more details in the commit message
>> V3: Only apply the increased timeout if the vendor is Parade
>>
>> drivers/gpu/drm/i915/display/intel_lspcon.c | 21 -
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/display/intel_lspcon.c 
>> b/drivers/gpu/drm/i915/display/intel_lspcon.c
>> index bb3b5355a0d9..b07eab84cc63 100644
>> --- a/drivers/gpu/drm/i915/display/intel_lspcon.c
>> +++ b/drivers/gpu/drm/i915/display/intel_lspcon.c
>> @@ -153,6 +153,24 @@ static enum drm_lspcon_mode 
>> lspcon_get_current_mode(struct intel_lspcon *lspcon)
>>   return current_mode;
>>  }
>>
>> +static u32 lspcon_get_mode_settle_timeout(struct intel_lspcon *lspcon)
>> +{
>> + u32 timeout_ms = 400;
>> +
>> + /*
>> + * On some CometLake-based device designs the Parade PS175 takes more
>> + * than 400ms to settle in PCON mode. 100 reboot trials on one device
>> + * resulted in a median settle time of 440ms and a maximum of 444ms.
>> + * Even after increasing the timeout to 500ms, 2% of devices still had
>> + * this error. So this sets the timeout to 800ms.
>> + */
>> + if (lspcon->vendor == LSPCON_VENDOR_PARADE)
>> + timeout_ms = 800;
>> +
>> + return timeout_ms;
>> +}
>> +
>> +
>>  static enum drm_lspcon_mode lspcon_wait_mode(struct intel_lspcon *lspcon,
>>   enum drm_lspcon_mode mode)
>>  {
>> @@ -167,7 +185,8 @@ static enum drm_lspcon_mode lspcon_wait_mode(struct 
>> intel_lspcon *lspcon,
>>   drm_dbg_kms(&i915->drm, "Waiting for LSPCON mode %s to settle\n",
>>  lspcon_mode_name(mode));
>>
>> - wait_for((current_mode = lspcon_get_current_mode(lspcon)) == mode, 400);
>> + wait_for((current_mode = lspcon_get_current_mode(lspcon)) == mode,
>> + lspcon_get_mode_settle_timeout(lspcon));
>>   if (current_mode != mode)
>>   drm_err(&i915->drm, "LSPCON mode hasn't settled\n");


Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Robin Murphy

On 12/09/2023 4:53 pm, Rob Herring wrote:

On Tue, Sep 12, 2023 at 11:13:50AM +0100, Robin Murphy wrote:

On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:

On 12/09/2023 08:16, Yong Wu (吴勇) wrote:

Hi Rob,

Thanks for your review.

On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:


External email : Please do not click links or open attachments until
you have verified the sender or the content.
   On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:

This adds the binding for describing a CMA memory for MediaTek

SVP(Secure

Video Path).


CMA is a Linux thing. How is this related to CMA?




Signed-off-by: Yong Wu 
---
   .../mediatek,secure_cma_chunkmem.yaml | 42

+++

   1 file changed, 42 insertions(+)
   create mode 100644 Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml


diff --git a/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml
b/Documentation/devicetree/bindings/reserved-
memory/mediatek,secure_cma_chunkmem.yaml

new file mode 100644
index ..cc10e00d35c4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-

memory/mediatek,secure_cma_chunkmem.yaml

@@ -0,0 +1,42 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id:

http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#

+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek Secure Video Path Reserved Memory


What makes this specific to Mediatek? Secure video path is fairly
common, right?


Here we just reserve a buffer and would like to create a dma-buf secure
heap for SVP, then the secure engines(Vcodec and DRM) could prepare
secure buffer through it.
But the heap driver is pure SW driver, it is not platform device and


All drivers are pure SW.


we don't have a corresponding HW unit for it. Thus I don't think I
could create a platform dtsi node and use "memory-region" pointer to
the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
[9/9]). Sorry if this is not right.


If this is not for any hardware and you already understand this (since
you cannot use other bindings) then you cannot have custom bindings for
it either.



Then in our usage case, is there some similar method to do this? or
any other suggestion?


Don't stuff software into DTS.


Aren't most reserved-memory bindings just software policy if you look at it
that way, though? IIUC this is a pool of memory that is visible and
available to the Non-Secure OS, but is fundamentally owned by the Secure
TEE, and pages that the TEE allocates from it will become physically
inaccessible to the OS. Thus the platform does impose constraints on how the
Non-Secure OS may use it, and per the rest of the reserved-memory bindings,
describing it as a "reusable" reservation seems entirely appropriate. If
anything that's *more* platform-related and so DT-relevant than typical
arbitrary reservations which just represent "save some memory to dedicate to
a particular driver" and don't actually bear any relationship to firmware or
hardware at all.


Yes, a memory range defined by hardware or firmware is within scope of
DT. (CMA at aribitrary address was questionable.)

My issue here is more that 'secure video memory' is not any way Mediatek
specific. AIUI, it's a requirement from certain content providers for
video playback to work. So why the Mediatek specific binding?


Based on the implementation, I'd ask the question the other way round - 
the way it works looks to be at least somewhat dependent on Mediatek's 
TEE, in ways where other vendors' equivalent implementations may be 
functionally incompatible, however nothing suggests it's actually 
specific to video (beyond that presumably being the primary use-case 
they had in mind).


Thanks,
Robin.


Re: [PATCH v3] drm/plane: Add documentation about software color conversion.

2023-09-12 Thread Michel Dänzer
On 9/11/23 10:38, Pekka Paalanen wrote:
> On Fri, 8 Sep 2023 17:10:46 +0200
> Thomas Zimmermann  wrote:
>> Am 08.09.23 um 16:41 schrieb Pekka Paalanen:
>>> On Fri, 8 Sep 2023 15:56:51 +0200
>>> Thomas Zimmermann  wrote:
 
 I have a number of concerns. My point it not that we shouldn't optimize.
 I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888
 for userspace to use.

 AFAICT the main argument against userspace is that Mesa doesn't like
 3-byte pixels. But I don't see how this conversion cannot be a
 post-processing step within Mesa: do the rendering in RGB32 and then
 convert to a framebuffer in RGB24.

Even assuming the conversion could be handled transparently in Mesa, it would 
still require the KMS client to pick RGB888 instead of XRGB. Most (all?) 
KMS clients support XRGB, many of them will realistically never support 
RGB888. (Or even if they did, they might prefer XRGB anyway, if RGB888 
requires a final conversion step)


 Another point of concern is CPU consumption: Slow I/O buses may stall
 the display thread, but the CPU could do something else in the meantime.
 Doing format conversion on the CPU prevents that, hence affecting other
 parts of the system negatively. Of course, that's more of a gut feeling
 than hard data.

Jocelyn, have you measured if the XRGB -> RGB888 conversion copy takes 
longer than a straight RGB888 -> RGB888 copy in the kernel?


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 8/9] dt-bindings: reserved-memory: MediaTek: Add reserved memory for SVP

2023-09-12 Thread Rob Herring
On Tue, Sep 12, 2023 at 11:13:50AM +0100, Robin Murphy wrote:
> On 12/09/2023 9:28 am, Krzysztof Kozlowski wrote:
> > On 12/09/2023 08:16, Yong Wu (吴勇) wrote:
> > > Hi Rob,
> > > 
> > > Thanks for your review.
> > > 
> > > On Mon, 2023-09-11 at 10:44 -0500, Rob Herring wrote:
> > > > 
> > > > External email : Please do not click links or open attachments until
> > > > you have verified the sender or the content.
> > > >   On Mon, Sep 11, 2023 at 10:30:37AM +0800, Yong Wu wrote:
> > > > > This adds the binding for describing a CMA memory for MediaTek
> > > > SVP(Secure
> > > > > Video Path).
> > > > 
> > > > CMA is a Linux thing. How is this related to CMA?
> > > 
> > > > > 
> > > > > Signed-off-by: Yong Wu 
> > > > > ---
> > > > >   .../mediatek,secure_cma_chunkmem.yaml | 42
> > > > +++
> > > > >   1 file changed, 42 insertions(+)
> > > > >   create mode 100644 Documentation/devicetree/bindings/reserved-
> > > > memory/mediatek,secure_cma_chunkmem.yaml
> > > > > 
> > > > > diff --git a/Documentation/devicetree/bindings/reserved-
> > > > memory/mediatek,secure_cma_chunkmem.yaml
> > > > b/Documentation/devicetree/bindings/reserved-
> > > > memory/mediatek,secure_cma_chunkmem.yaml
> > > > > new file mode 100644
> > > > > index ..cc10e00d35c4
> > > > > --- /dev/null
> > > > > +++ b/Documentation/devicetree/bindings/reserved-
> > > > memory/mediatek,secure_cma_chunkmem.yaml
> > > > > @@ -0,0 +1,42 @@
> > > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > > +%YAML 1.2
> > > > > +---
> > > > > +$id:
> > > > http://devicetree.org/schemas/reserved-memory/mediatek,secure_cma_chunkmem.yaml#
> > > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > > +
> > > > > +title: MediaTek Secure Video Path Reserved Memory
> > > > 
> > > > What makes this specific to Mediatek? Secure video path is fairly
> > > > common, right?
> > > 
> > > Here we just reserve a buffer and would like to create a dma-buf secure
> > > heap for SVP, then the secure engines(Vcodec and DRM) could prepare
> > > secure buffer through it.
> > > But the heap driver is pure SW driver, it is not platform device and
> > 
> > All drivers are pure SW.
> > 
> > > we don't have a corresponding HW unit for it. Thus I don't think I
> > > could create a platform dtsi node and use "memory-region" pointer to
> > > the region. I used RESERVEDMEM_OF_DECLARE currently(The code is in
> > > [9/9]). Sorry if this is not right.
> > 
> > If this is not for any hardware and you already understand this (since
> > you cannot use other bindings) then you cannot have custom bindings for
> > it either.
> > 
> > > 
> > > Then in our usage case, is there some similar method to do this? or
> > > any other suggestion?
> > 
> > Don't stuff software into DTS.
> 
> Aren't most reserved-memory bindings just software policy if you look at it
> that way, though? IIUC this is a pool of memory that is visible and
> available to the Non-Secure OS, but is fundamentally owned by the Secure
> TEE, and pages that the TEE allocates from it will become physically
> inaccessible to the OS. Thus the platform does impose constraints on how the
> Non-Secure OS may use it, and per the rest of the reserved-memory bindings,
> describing it as a "reusable" reservation seems entirely appropriate. If
> anything that's *more* platform-related and so DT-relevant than typical
> arbitrary reservations which just represent "save some memory to dedicate to
> a particular driver" and don't actually bear any relationship to firmware or
> hardware at all.

Yes, a memory range defined by hardware or firmware is within scope of 
DT. (CMA at aribitrary address was questionable.)

My issue here is more that 'secure video memory' is not any way Mediatek 
specific. AIUI, it's a requirement from certain content providers for 
video playback to work. So why the Mediatek specific binding?

Rob


Re: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 10:11:56PM +0800, kernel test robot wrote:
> Hi Matthew,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on drm/drm-next]
> [also build test ERROR on drm-exynos/exynos-drm-next 
> drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6-rc1 
> next-20230912]
> [cannot apply to drm-misc/drm-misc-next drm-intel/for-linux-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Matthew-Brost/drm-sched-Add-drm_sched_submit_-helpers/20230912-102001
> base:   git://anongit.freedesktop.org/drm/drm drm-next
> patch link:
> https://lore.kernel.org/r/20230912021615.2086698-4-matthew.brost%40intel.com
> patch subject: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler 
> / entity
> config: arm64-randconfig-r032-20230912 
> (https://download.01.org/0day-ci/archive/20230912/202309122100.haei8ytj-...@intel.com/config)
> compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git 
> ae42196bc493ffe877a7e3dff8be32035dea4d07)
> reproduce (this is a W=1 build): 
> (https://download.01.org/0day-ci/archive/20230912/202309122100.haei8ytj-...@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version 
> of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202309122100.haei8ytj-...@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/gpu/drm/v3d/v3d_sched.c:403:9: error: use of undeclared identifier 
> >> 'ULL'
> ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
> ^

Typos, s/ULL/NULL in next rev.

Matt

>1 error generated.
> 
> 
> vim +/ULL +403 drivers/gpu/drm/v3d/v3d_sched.c
> 
>381
>382int
>383v3d_sched_init(struct v3d_dev *v3d)
>384{
>385int hw_jobs_limit = 1;
>386int job_hang_limit = 0;
>387int hang_limit_ms = 500;
>388int ret;
>389
>390ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
>391 &v3d_bin_sched_ops, NULL,
>392 hw_jobs_limit, job_hang_limit,
>393 msecs_to_jiffies(hang_limit_ms), 
> NULL,
>394 NULL, "v3d_bin", 
> DRM_SCHED_POLICY_DEFAULT,
>395 v3d->drm.dev);
>396if (ret)
>397return ret;
>398
>399ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
>400 &v3d_render_sched_ops, NULL,
>401 hw_jobs_limit, job_hang_limit,
>402 msecs_to_jiffies(hang_limit_ms), 
> NULL,
>  > 403 ULL, "v3d_render", 
> DRM_SCHED_POLICY_DEFAULT,
>404 v3d->drm.dev);
>405if (ret)
>406goto fail;
>407
>408ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
>409 &v3d_tfu_sched_ops, NULL,
>410 hw_jobs_limit, job_hang_limit,
>411 msecs_to_jiffies(hang_limit_ms), 
> NULL,
>412 NULL, "v3d_tfu", 
> DRM_SCHED_POLICY_DEFAULT,
>413 v3d->drm.dev);
>414if (ret)
>415goto fail;
>416
>417if (v3d_has_csd(v3d)) {
>418ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
>419 &v3d_csd_sched_ops, NULL,
>420 hw_jobs_limit, 
> job_hang_limit,
>421 
> msecs_to_jiffies(hang_limit_ms), NULL,
>422 NULL, "v3d_csd", 
> DRM_SCHED_POLICY_DEFAULT,
>423 v3d->drm.dev);
>424 

Re: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 09:37:06AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:05 -0700
> Matthew Brost  wrote:
> 
> > Rather than a global modparam for scheduling policy, move the scheduling
> > policy to scheduler / entity so user can control each scheduler / entity
> > policy.
> 
> I'm a bit confused by the commit message (I think I'm okay with the
> diff though). Sounds like entity is involved in the sched policy
> choice, but AFAICT, it just has to live with the scheduler policy chosen
> by the driver at init time. If my understanding is correct, I'd just
> drop the ' / entity'.

Yep, stale commit message. Will fix in next rev.

Matt


Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 16:49:09 +0200
Boris Brezillon  wrote:

> On Tue, 12 Sep 2023 16:33:01 +0200
> Danilo Krummrich  wrote:
> 
> > On 9/12/23 16:28, Boris Brezillon wrote:  
> > > On Thu, 17 Aug 2023 13:13:31 +0200
> > > Danilo Krummrich  wrote:
> > > 
> > >> I think that's a misunderstanding. I'm not trying to say that it is
> > >> *always* beneficial to fill up the ring as much as possible. But I think
> > >> it is under certain circumstances, exactly those circumstances I
> > >> described for Nouveau.
> > >>
> > >> As mentioned, in Nouveau the size of a job is only really limited by the
> > >> ring size, which means that one job can (but does not necessarily) fill
> > >> up the whole ring. We both agree that this is inefficient, because it
> > >> potentially results into the HW run dry due to hw_submission_limit == 1.
> > >>
> > >> I recognize you said that one should define hw_submission_limit and
> > >> adjust the other parts of the equation accordingly, the options I see 
> > >> are:
> > >>
> > >> (1) Increase the ring size while keeping the maximum job size.
> > >> (2) Decrease the maximum job size while keeping the ring size.
> > >> (3) Let the scheduler track the actual job size rather than the maximum
> > >> job size.
> > >>
> > >> (1) results into potentially wasted ring memory, because we're not
> > >> always reaching the maximum job size, but the scheduler assumes so.
> > >>
> > >> (2) results into more IOCTLs from userspace for the same amount of IBs
> > >> and more jobs result into more memory allocations and more work being
> > >> submitted to the workqueue (with Matt's patches).
> > >>
> > >> (3) doesn't seem to have any of those draw backs.
> > >>
> > >> What would be your take on that?
> > >>
> > >> Actually, if none of the other drivers is interested into a more precise
> > >> way of keeping track of the ring utilization, I'd be totally fine to do
> > >> it in a driver specific way. However, unfortunately I don't see how this
> > >> would be possible.
> > > 
> > > I'm not entirely sure, but I think PowerVR is pretty close to your
> > > description: jobs size is dynamic size, and the ring buffer size is
> > > picked by the driver at queue initialization time. What we did was to
> > > set hw_submission_limit to an arbitrarily high value of 64k (we could
> > > have used something like ringbuf_size/min_job_size instead), and then
> > > have the control flow implemented with ->prepare_job() [1] (CCCB is the
> > > PowerVR ring buffer). This allows us to maximize ring buffer utilization
> > > while still allowing dynamic-size jobs.
> > 
> > I guess this would work, but I think it would be better to bake this in,
> > especially if more drivers do have this need. I already have an
> > implementation [1] for doing that in the scheduler. My plan was to push
> > that as soon as Matt sends out V3.
> > 
> > [1] 
> > https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95
> >   
> 
> PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
> native fences waits are passed to the FW, and those add to the job size.
> When we know our job is ready for execution (all non-native deps are
> signaled), we evict already signaled native-deps (or native fences) to
> shrink the job size further more, but that's something we need to
> calculate late if we want the job size to be minimal. Of course, we can
> always over-estimate the job size, but if we go for a full-blown
> drm_sched integration, I wonder if it wouldn't be preferable to have a
> ->get_job_size() callback returning the number of units needed by job,  
> and have the core pick 1 when the hook is not implemented.

FWIW, I think last time I asked how to do that, I've been pointed to
->prepare_job() by someone  (don't remember if it was Daniel or
Christian), hence the PowerVR implementation. If that's still the
preferred solution, there's some opportunity to have a generic layer to
automate ringbuf utilization tracking and some helpers to prepare
wait_for_ringbuf dma_fences that drivers could return from
->prepare_job() (those fences would then be signaled when the driver
calls drm_ringbuf_job_done() and the next job waiting for ringbuf space
now fits in the ringbuf).


Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le mardi 12 septembre 2023 à 08:47 +, Yong Wu (吴勇) a écrit :
> On Mon, 2023-09-11 at 12:12 -0400, Nicolas Dufresne wrote:
> >  
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >  Hi,
> > 
> > Le lundi 11 septembre 2023 à 10:30 +0800, Yong Wu a écrit :
> > > From: John Stultz 
> > > 
> > > This allows drivers who don't want to create their own
> > > DMA-BUF exporter to be able to allocate DMA-BUFs directly
> > > from existing DMA-BUF Heaps.
> > > 
> > > There is some concern that the premise of DMA-BUF heaps is
> > > that userland knows better about what type of heap memory
> > > is needed for a pipeline, so it would likely be best for
> > > drivers to import and fill DMA-BUFs allocated by userland
> > > instead of allocating one themselves, but this is still
> > > up for debate.
> > 
> > 
> > Would be nice for the reviewers to provide the information about the
> > user of
> > this new in-kernel API. I noticed it because I was CCed, but
> > strangely it didn't
> > make it to the mailing list yet and its not clear in the cover what
> > this is used
> > with. 
> > 
> > I can explain in my words though, my read is that this is used to
> > allocate both
> > user visible and driver internal memory segments in MTK VCODEC
> > driver.
> > 
> > I'm somewhat concerned that DMABuf objects are used to abstract
> > secure memory
> > allocation from tee. For framebuffers that are going to be exported
> > and shared
> > its probably fair use, but it seems that internal shared memory and
> > codec
> > specific reference buffers also endup with a dmabuf fd (often called
> > a secure fd
> > in the v4l2 patchset) for data that is not being shared, and requires
> > a 1:1
> > mapping to a tee handle anyway. Is that the design we'd like to
> > follow ? 
> 
> Yes. basically this is right.
> 
> > Can't
> > we directly allocate from the tee, adding needed helper to make this
> > as simple
> > as allocating from a HEAP ?
> 
> If this happens, the memory will always be inside TEE. Here we create a
> new _CMA heap, it will cma_alloc/free dynamically. Reserve it before
> SVP start, and release to kernel after SVP done.

Ok, I see the benefit of having a common driver then. It would add to the
complexity, but having a driver for the tee allocator and v4l2/heaps would be
another option?

>   
> Secondly. the v4l2/drm has the mature driver control flow, like
> drm_gem_prime_import_dev that always use dma_buf ops. So we can use the
> current flow as much as possible without having to re-plan a flow in
> the TEE.

>From what I've read from Yunfei series, this is only partially true for V4L2.
The vb2 queue MMAP feature have dmabuf exportation as optional, but its not a
problem to always back it up with a dmabuf object. But for internal SHM buffers
used for firmware communication, I've never seen any driver use a DMABuf.

Same applies for primary decode buffers when frame buffer compression or post-
processing it used (or reconstruction buffer in encoders), these are not user
visible and are usually not DMABuf.

> 
> > 
> > Nicolas
> > 
> > > 
> > > Signed-off-by: John Stultz 
> > > Signed-off-by: T.J. Mercier 
> > > Signed-off-by: Yong Wu 
> > > [Yong: Fix the checkpatch alignment warning]
> > > ---
> > >  drivers/dma-buf/dma-heap.c | 60 
> > --
> > >  include/linux/dma-heap.h   | 25 
> > >  2 files changed, 69 insertions(+), 16 deletions(-)
> > > 
> [snip]



Re: [PATCH v3 02/13] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 09:29:53AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:04 -0700
> Matthew Brost  wrote:
> 
> > @@ -1071,6 +1063,7 @@ static int drm_sched_main(void *param)
> >   *
> >   * @sched: scheduler instance
> >   * @ops: backend operations for this scheduler
> > + * @submit_wq: workqueue to use for submission. If NULL, the system_wq is 
> > used
> >   * @hw_submission: number of hw submissions that can be in flight
> >   * @hang_limit: number of times to allow a job to hang before dropping it
> >   * @timeout: timeout value in jiffies for the scheduler
> > @@ -1084,14 +1077,16 @@ static int drm_sched_main(void *param)
> >   */
> >  int drm_sched_init(struct drm_gpu_scheduler *sched,
> >const struct drm_sched_backend_ops *ops,
> > +  struct workqueue_struct *submit_wq,
> >unsigned hw_submission, unsigned hang_limit,
> >long timeout, struct workqueue_struct *timeout_wq,
> >atomic_t *score, const char *name, struct device *dev)
> >  {
> > -   int i, ret;
> > +   int i;
> > sched->ops = ops;
> > sched->hw_submission_limit = hw_submission;
> > sched->name = name;
> > +   sched->submit_wq = submit_wq ? : system_wq;
> 
> My understanding is that the new design is based on the idea of
> splitting the drm_sched_main function into work items that can be
> scheduled independently so users/drivers can insert their own
> steps/works without requiring changes to drm_sched. This approach is
> relying on the properties of ordered workqueues (1 work executed at a
> time, FIFO behavior) to guarantee that these steps are still executed
> in order, and one at a time.
> 
> Given what you're trying to achieve I think we should create an ordered
> workqueue instead of using the system_wq when submit_wq is NULL,
> otherwise you lose this ordering/serialization guarantee which both
> the dedicated kthread and ordered wq provide. It will probably work for
> most drivers, but might lead to subtle/hard to spot ordering issues.
> 

I debated chosing between a system_wq or creating an ordered-wq by
default myself. Indeed using the system_wq by default subtlety changes
the behavior as run_job & free_job workers can run in parallel. To be
safe, agree the default use be an ordered-wq. If drivers are fine with
run_job() and free_job() running in parallel, they are free to set
submit_wq == system_wq. Will change in next rev.

Matt

> > sched->timeout = timeout;
> > sched->timeout_wq = timeout_wq ? : system_wq;
> > sched->hang_limit = hang_limit;
> > @@ -1100,23 +1095,15 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> > for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
> > drm_sched_rq_init(sched, &sched->sched_rq[i]);
> >  
> > -   init_waitqueue_head(&sched->wake_up_worker);
> > init_waitqueue_head(&sched->job_scheduled);
> > INIT_LIST_HEAD(&sched->pending_list);
> > spin_lock_init(&sched->job_list_lock);
> > atomic_set(&sched->hw_rq_count, 0);
> > INIT_DELAYED_WORK(&sched->work_tdr, drm_sched_job_timedout);
> > +   INIT_WORK(&sched->work_submit, drm_sched_main);
> > atomic_set(&sched->_score, 0);
> > atomic64_set(&sched->job_id_count, 0);
> > -
> > -   /* Each scheduler will run on a seperate kernel thread */
> > -   sched->thread = kthread_run(drm_sched_main, sched, sched->name);
> > -   if (IS_ERR(sched->thread)) {
> > -   ret = PTR_ERR(sched->thread);
> > -   sched->thread = NULL;
> > -   DRM_DEV_ERROR(sched->dev, "Failed to create scheduler for 
> > %s.\n", name);
> > -   return ret;
> > -   }
> > +   sched->pause_submit = false;
> >  
> > sched->ready = true;
> > return 0;


Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le mardi 12 septembre 2023 à 16:46 +0200, Christian König a écrit :
> Am 12.09.23 um 10:52 schrieb Yong Wu (吴勇):
> > [SNIP]
> > > But what we should try to avoid is that newly merged drivers provide
> > > both a driver specific UAPI and DMA-heaps. The justification that
> > > this
> > > makes it easier to transit userspace to the new UAPI doesn't really
> > > count.
> > > 
> > > That would be adding UAPI already with a plan to deprecate it and
> > > that
> > > is most likely not helpful considering that UAPI must be supported
> > > forever as soon as it is upstream.
> > Sorry, I didn't understand this. I think we have not change the UAPI.
> > Which code are you referring to?
> 
> Well, what do you need this for if not a new UAPI?
> 
> My assumption here is that you need to export the DMA-heap allocation 
> function so that you can server an UAPI in your new driver. Or what else 
> is that good for?
> 
> As far as I understand you try to upstream your new vcodec driver. So 
> while this change here seems to be a good idea to clean up existing 
> drivers it doesn't look like a good idea for a newly created driver.

MTK VCODEC has been upstream for quite some time now. The other patchset is
trying to add secure decoding/encoding support to that existing upstream driver.

Regarding the uAPI, it seems that this addition to dmabuf heap internal API is
exactly the opposite. By making heaps available to drivers, modification to the
V4L2 uAPI is being reduced to adding "SECURE_MODE" + "SECURE_HEAP_ID" controls
(this is not debated yet has an approach). The heaps is being used internally in
replacement to every allocation, user visible or not.

Nicolas

> 
> Regards,
> Christian.
> 
> > > > So I think this patch is a little confusing in this series, as I
> > > don't
> > > > see much of it actually being used here (though forgive me if I'm
> > > > missing it).
> > > > 
> > > > Instead, It seems it get used in a separate patch series here:
> > > > 
> > > https://lore.kernel.org/all/20230911125936.10648-1-yunfei.d...@mediatek.com/
> > > 
> > > Please try to avoid stuff like that it is really confusing and eats
> > > reviewers time.
> > My fault, I thought dma-buf and media belonged to the different tree,
> > so I send them separately. The cover letter just said "The consumers of
> > the new heap and new interface are our codecs and DRM, which will be
> > sent upstream soon", and there was no vcodec link at that time.
> > 
> > In the next version, we will put the first three patches into the
> > vcodec patchset.
> > 
> > Thanks.
> > 
> 



Re: [PATCH v3 05/13] drm/sched: Split free_job into own work item

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 04:53:00PM +0200, Boris Brezillon wrote:
> On Tue, 12 Sep 2023 14:37:57 +
> Matthew Brost  wrote:
> 
> > > Looks like you are changing the behavior here (unconditional ->
> > > conditional timestamp update)? Probably something that should go in a
> > > separate patch.
> > >   
> > 
> > This patch creates a race so this check isn't need before this patch.
> > With that I think it makes sense to have all in a single patch. If you
> > feel strongly about this, I can break this change out into a patch prior
> > to this one.
> 
> It's probably fine to keep it in this patch, but we should
> definitely have a comment explaining why this check is needed.

Sure, will add comment in next rev.

Matt


Re: [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 12:28:28PM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:13 -0700
> Matthew Brost  wrote:
> 
> > +void drm_sched_add_pending_job(struct drm_sched_job *job, bool tail)
> > +{
> > +   struct drm_gpu_scheduler *sched = job->sched;
> > +   struct drm_sched_entity *entity = job->entity;
> 
> drm_sched_entity_pop_job() sets job->entity to NULL [1], and I end with
> a NULL deref in this function. I guess you have another patch in your
> tree dropping this job->entity = NULL in drm_sched_entity_pop_job(),
> but given this comment [1], it's probably not the right thing to do.
> 

Didn't fully test this one, regardless I will drop this patch in the
next rev.

Matt

> > +
> > +   lockdep_assert_held(&sched->job_list_lock);
> > +
> > +   if (tail)
> > +   list_add_tail(&job->list, &sched->pending_list);
> > +   else
> > +   list_add(&job->list, &sched->pending_list);
> > +   if (!entity->pending_job_count++)
> > +   reinit_completion(&entity->jobs_done);
> > +}
> > +EXPORT_SYMBOL(drm_sched_add_pending_job);
> 
> [1]https://elixir.bootlin.com/linux/v6.6-rc1/source/drivers/gpu/drm/scheduler/sched_entity.c#L497


Re: [PATCH v3 05/13] drm/sched: Split free_job into own work item

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 14:37:57 +
Matthew Brost  wrote:

> > Looks like you are changing the behavior here (unconditional ->
> > conditional timestamp update)? Probably something that should go in a
> > separate patch.
> >   
> 
> This patch creates a race so this check isn't need before this patch.
> With that I think it makes sense to have all in a single patch. If you
> feel strongly about this, I can break this change out into a patch prior
> to this one.

It's probably fine to keep it in this patch, but we should
definitely have a comment explaining why this check is needed.


Re: linux-next: Tree for Sep 11 (drivers/gpu/drm/i915/display/intel_backlight.o)

2023-09-12 Thread Randy Dunlap



On 9/12/23 00:47, Jani Nikula wrote:
> On Mon, 11 Sep 2023, Randy Dunlap  wrote:
>> On 9/10/23 19:11, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> Please do *not* include material destined for v6.7 in your linux-next
>>> included branches until *after* v6.6-rc1 has been released.  Also,
>>> do *not* rebase your linu-next included branches onto v6.5.
>>>
>>> Changes since 20230908:
>>>
>>> Non-merge commits (relative to Linus' tree): 643
>>>  614 files changed, 227990 insertions(+), 9502 deletions(-)
>>>
>>> 
>>
>> on x86_64:
>>
>> # CONFIG_ACPI is not set
>> CONFIG_DRM_I915=y
>> CONFIG_BACKLIGHT_CLASS_DEVICE=m
>>
>> I915 selects BACKLIGHT_CLASS_DEVICE if ACPI is set.
>>
>> ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
>> `intel_backlight_device_register':
>> intel_backlight.c:(.text+0x4988): undefined reference to 
>> `backlight_device_get_by_name'
>> ld: intel_backlight.c:(.text+0x4a1b): undefined reference to 
>> `backlight_device_register'
>> ld: drivers/gpu/drm/i915/display/intel_backlight.o: in function 
>> `intel_backlight_device_unregister':
>> intel_backlight.c:(.text+0x4b56): undefined reference to 
>> `backlight_device_unregister'
> 
> This comes up periodically. The fix is for i915 to depend on backlight,
> but it's not possible to fix just i915, as it'll lead to circular deps
> unless *all* select backlight is switched to depend on backlight.
> 
> I've gone through it once [1], and not keen on doing it again unless
> there's buy-in.
> 
> IS_REACHABLE() is often suggested as a workaround, but I think it's just
> plain wrong. i915=y backlight=m is not a configuration that makes
> sense. Kernel configuration is hard enough, there's no point in allowing
> dumb configs that just silently don't work.
> 

Yes, IS_REACHABLE() is just fugly nonsense.

Thanks for the reminder of your attempt(s).

> 
> BR,
> Jani.
> 
> 
> [1] 
> https://lore.kernel.org/r/1413580403-16225-1-git-send-email-jani.nik...@intel.com
> 
> 
> 

-- 
~Randy


Re: [PATCH v3 06/13] drm/sched: Add generic scheduler message interface

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 10:23:02AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:08 -0700
> Matthew Brost  wrote:
> 
> > Add generic schedule message interface which sends messages to backend
> > from the drm_gpu_scheduler main submission thread. The idea is some of
> > these messages modify some state in drm_sched_entity which is also
> > modified during submission. By scheduling these messages and submission
> > in the same thread their is not race changing states in
> > drm_sched_entity.
> > 
> > This interface will be used in Xe, new Intel GPU driver, to cleanup,
> > suspend, resume, and change scheduling properties of a drm_sched_entity.
> > 
> > The interface is designed to be generic and extendable with only the
> > backend understanding the messages.
> 
> I didn't follow the previous discussions closely enough, but it seemed
> to me that the whole point of this 'ordered-wq for scheduler' approach
> was so you could interleave your driver-specific work items in the
> processing without changing the core. This messaging system looks like
> something that could/should be entirely driver-specific to me, and I'm
> not convinced this thin 'work -> generic_message_callback' layer is
> worth it. You can simply have your own xe_msg_process work, and a
> xe_msg_send helper that schedules this work. Assuming other drivers
> need this messaging API, they'll probably have their own message ids
> and payloads, and the automation done here is simple enough that it can
> be duplicated. That's just my personal opinion, of course, and if
> others see this message interface as valuable, I fine with it.

Good point. I am fine deleting this from the scheduler and making this
driver specific.

Matt


Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Nicolas Dufresne
Le lundi 11 septembre 2023 à 12:13 +0200, Christian König a écrit :
> Am 11.09.23 um 04:30 schrieb Yong Wu:
> > From: John Stultz 
> > 
> > This allows drivers who don't want to create their own
> > DMA-BUF exporter to be able to allocate DMA-BUFs directly
> > from existing DMA-BUF Heaps.
> > 
> > There is some concern that the premise of DMA-BUF heaps is
> > that userland knows better about what type of heap memory
> > is needed for a pipeline, so it would likely be best for
> > drivers to import and fill DMA-BUFs allocated by userland
> > instead of allocating one themselves, but this is still
> > up for debate.
> 
> The main design goal of having DMA-heaps in the first place is to avoid 
> per driver allocation and this is not necessary because userland know 
> better what type of memory it wants.

If the memory is user visible, yes. When I look at the MTK VCODEC changes, this
seems to be used for internal codec state and SHM buffers used to communicate
with firmware.

> 
> The background is rather that we generally want to decouple allocation 
> from having a device driver connection so that we have better chance 
> that multiple devices can work with the same memory.
> 
> I once create a prototype which gives userspace a hint which DMA-heap to 
> user for which device: 
> https://patchwork.kernel.org/project/linux-media/patch/20230123123756.401692-2-christian.koe...@amd.com/
> 
> Problem is that I don't really have time to look into it and maintain 
> that stuff, but I think from the high level design that is rather the 
> general direction we should push at.
> 
> Regards,
> Christian.
> 
> > 
> > Signed-off-by: John Stultz 
> > Signed-off-by: T.J. Mercier 
> > Signed-off-by: Yong Wu 
> > [Yong: Fix the checkpatch alignment warning]
> > ---
> >   drivers/dma-buf/dma-heap.c | 60 --
> >   include/linux/dma-heap.h   | 25 
> >   2 files changed, 69 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
> > index dcc0e38c61fa..908bb30dc864 100644
> > --- a/drivers/dma-buf/dma-heap.c
> > +++ b/drivers/dma-buf/dma-heap.c
> > @@ -53,12 +53,15 @@ static dev_t dma_heap_devt;
> >   static struct class *dma_heap_class;
> >   static DEFINE_XARRAY_ALLOC(dma_heap_minors);
> >   
> > -static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > -unsigned int fd_flags,
> > -unsigned int heap_flags)
> > +struct dma_buf *dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
> > + unsigned int fd_flags,
> > + unsigned int heap_flags)
> >   {
> > -   struct dma_buf *dmabuf;
> > -   int fd;
> > +   if (fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > +   return ERR_PTR(-EINVAL);
> >   
> > /*
> >  * Allocations from all heaps have to begin
> > @@ -66,9 +69,20 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, 
> > size_t len,
> >  */
> > len = PAGE_ALIGN(len);
> > if (!len)
> > -   return -EINVAL;
> > +   return ERR_PTR(-EINVAL);
> >   
> > -   dmabuf = heap->ops->allocate(heap, len, fd_flags, heap_flags);
> > +   return heap->ops->allocate(heap, len, fd_flags, heap_flags);
> > +}
> > +EXPORT_SYMBOL_GPL(dma_heap_buffer_alloc);
> > +
> > +static int dma_heap_bufferfd_alloc(struct dma_heap *heap, size_t len,
> > +  unsigned int fd_flags,
> > +  unsigned int heap_flags)
> > +{
> > +   struct dma_buf *dmabuf;
> > +   int fd;
> > +
> > +   dmabuf = dma_heap_buffer_alloc(heap, len, fd_flags, heap_flags);
> > if (IS_ERR(dmabuf))
> > return PTR_ERR(dmabuf);
> >   
> > @@ -106,15 +120,9 @@ static long dma_heap_ioctl_allocate(struct file *file, 
> > void *data)
> > if (heap_allocation->fd)
> > return -EINVAL;
> >   
> > -   if (heap_allocation->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS)
> > -   return -EINVAL;
> > -
> > -   if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
> > -   return -EINVAL;
> > -
> > -   fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
> > -  heap_allocation->fd_flags,
> > -  heap_allocation->heap_flags);
> > +   fd = dma_heap_bufferfd_alloc(heap, heap_allocation->len,
> > +heap_allocation->fd_flags,
> > +heap_allocation->heap_flags);
> > if (fd < 0)
> > return fd;
> >   
> > @@ -205,6 +213,7 @@ const char *dma_heap_get_name(struct dma_heap *heap)
> >   {
> > return heap->name;
> >   }
> > +EXPORT_SYMBOL_GPL(dma_heap_get_name);
> >   
> >   struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
> >   {
> > @@ -290,6 +299,24 @@ struct dma_heap *dma_heap_add(const 

Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 16:33:01 +0200
Danilo Krummrich  wrote:

> On 9/12/23 16:28, Boris Brezillon wrote:
> > On Thu, 17 Aug 2023 13:13:31 +0200
> > Danilo Krummrich  wrote:
> >   
> >> I think that's a misunderstanding. I'm not trying to say that it is
> >> *always* beneficial to fill up the ring as much as possible. But I think
> >> it is under certain circumstances, exactly those circumstances I
> >> described for Nouveau.
> >>
> >> As mentioned, in Nouveau the size of a job is only really limited by the
> >> ring size, which means that one job can (but does not necessarily) fill
> >> up the whole ring. We both agree that this is inefficient, because it
> >> potentially results into the HW run dry due to hw_submission_limit == 1.
> >>
> >> I recognize you said that one should define hw_submission_limit and
> >> adjust the other parts of the equation accordingly, the options I see are:
> >>
> >> (1) Increase the ring size while keeping the maximum job size.
> >> (2) Decrease the maximum job size while keeping the ring size.
> >> (3) Let the scheduler track the actual job size rather than the maximum
> >> job size.
> >>
> >> (1) results into potentially wasted ring memory, because we're not
> >> always reaching the maximum job size, but the scheduler assumes so.
> >>
> >> (2) results into more IOCTLs from userspace for the same amount of IBs
> >> and more jobs result into more memory allocations and more work being
> >> submitted to the workqueue (with Matt's patches).
> >>
> >> (3) doesn't seem to have any of those draw backs.
> >>
> >> What would be your take on that?
> >>
> >> Actually, if none of the other drivers is interested into a more precise
> >> way of keeping track of the ring utilization, I'd be totally fine to do
> >> it in a driver specific way. However, unfortunately I don't see how this
> >> would be possible.  
> > 
> > I'm not entirely sure, but I think PowerVR is pretty close to your
> > description: jobs size is dynamic size, and the ring buffer size is
> > picked by the driver at queue initialization time. What we did was to
> > set hw_submission_limit to an arbitrarily high value of 64k (we could
> > have used something like ringbuf_size/min_job_size instead), and then
> > have the control flow implemented with ->prepare_job() [1] (CCCB is the
> > PowerVR ring buffer). This allows us to maximize ring buffer utilization
> > while still allowing dynamic-size jobs.  
> 
> I guess this would work, but I think it would be better to bake this in,
> especially if more drivers do have this need. I already have an
> implementation [1] for doing that in the scheduler. My plan was to push
> that as soon as Matt sends out V3.
> 
> [1] 
> https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95

PowerVR's ->can_fit_in_ringbuf() logic is a bit more involved in that
native fences waits are passed to the FW, and those add to the job size.
When we know our job is ready for execution (all non-native deps are
signaled), we evict already signaled native-deps (or native fences) to
shrink the job size further more, but that's something we need to
calculate late if we want the job size to be minimal. Of course, we can
always over-estimate the job size, but if we go for a full-blown
drm_sched integration, I wonder if it wouldn't be preferable to have a
->get_job_size() callback returning the number of units needed by job,
and have the core pick 1 when the hook is not implemented.


Re: [PATCH v3 11/13] drm/sched: Waiting for pending jobs to complete in scheduler kill

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 11:57:30AM +0200, Christian König wrote:
> Am 12.09.23 um 04:16 schrieb Matthew Brost:
> > Wait for pending jobs to be complete before signaling queued jobs. This
> > ensures dma-fence signaling order correct and also ensures the entity is
> > not running on the hardware after drm_sched_entity_flush or
> > drm_sched_entity_fini returns.
> 
> Entities are *not* supposed to outlive the submissions they carry and we
> absolutely *can't* wait for submissions to finish while killing the entity.
> 
> In other words it is perfectly expected that entities doesn't exists any
> more while the submissions they carried are still running on the hardware.
> 
> I somehow better need to document how this working and especially why it is
> working like that.
> 
> This approach came up like four or five times now and we already applied and
> reverted patches doing this.
> 
> For now let's take a look at the source code of drm_sched_entity_kill():
> 
>    /* The entity is guaranteed to not be used by the scheduler */
>     prev = rcu_dereference_check(entity->last_scheduled, true);
>     dma_fence_get(prev);
> 
>     while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue
> {
>     struct drm_sched_fence *s_fence = job->s_fence;
> 
>     dma_fence_get(&s_fence->finished);
>     if (!prev || dma_fence_add_callback(prev, &job->finish_cb,
> drm_sched_entity_kill_jobs_cb))
>     drm_sched_entity_kill_jobs_cb(NULL,
> &job->finish_cb);
> 
>     prev = &s_fence->finished;
>     }
>     dma_fence_put(prev);
> 
> This ensures the dma-fence signaling order by delegating signaling of the
> scheduler fences into callbacks.
> 

Thanks for the explaination, this code makes more sense now. Agree this
patch is not correct.

This patch really is an RFC for something Nouveau needs, I can delete
this patch in the next rev and let Nouveau run with a slightly different
version if needed.

Matt

> Regards,
> Christian.
> 
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
> >   drivers/gpu/drm/scheduler/sched_entity.c|  7 ++-
> >   drivers/gpu/drm/scheduler/sched_main.c  | 50 ++---
> >   include/drm/gpu_scheduler.h | 18 
> >   4 files changed, 70 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > index fb5dad687168..7835c0da65c5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > @@ -1873,7 +1873,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
> > amdgpu_ring *ring)
> > list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
> > if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
> > /* remove job from ring_mirror_list */
> > -   list_del_init(&s_job->list);
> > +   drm_sched_remove_pending_job(s_job);
> > sched->ops->free_job(s_job);
> > continue;
> > }
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index 1dec97caaba3..37557fbb96d0 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -104,9 +104,11 @@ int drm_sched_entity_init(struct drm_sched_entity 
> > *entity,
> > }
> > init_completion(&entity->entity_idle);
> > +   init_completion(&entity->jobs_done);
> > -   /* We start in an idle state. */
> > +   /* We start in an idle and jobs done state. */
> > complete_all(&entity->entity_idle);
> > +   complete_all(&entity->jobs_done);
> > spin_lock_init(&entity->rq_lock);
> > spsc_queue_init(&entity->job_queue);
> > @@ -256,6 +258,9 @@ static void drm_sched_entity_kill(struct 
> > drm_sched_entity *entity)
> > /* Make sure this entity is not used by the scheduler at the moment */
> > wait_for_completion(&entity->entity_idle);
> > +   /* Make sure all pending jobs are done */
> > +   wait_for_completion(&entity->jobs_done);
> > +
> > /* The entity is guaranteed to not be used by the scheduler */
> > prev = rcu_dereference_check(entity->last_scheduled, true);
> > dma_fence_get(prev);
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 689fb6686e01..ed6f5680793a 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -510,12 +510,52 @@ void drm_sched_resume_timeout(struct 
> > drm_gpu_scheduler *sched,
> >   }
> >   EXPORT_SYMBOL(drm_sched_resume_timeout);
> > +/**
> > + * drm_sched_add_pending_job - Add pending job to scheduler
> > + *
> > + * @job: scheduler job to add
> > + * @tail: add to tail of pending list
> > + */
> > +void drm_sched

Re: [PATCH 3/9] dma-heap: Provide accessors so that in-kernel drivers can allocate dmabufs from specific heaps

2023-09-12 Thread Christian König

Am 12.09.23 um 10:52 schrieb Yong Wu (吴勇):

[SNIP]

But what we should try to avoid is that newly merged drivers provide
both a driver specific UAPI and DMA-heaps. The justification that
this
makes it easier to transit userspace to the new UAPI doesn't really
count.

That would be adding UAPI already with a plan to deprecate it and
that
is most likely not helpful considering that UAPI must be supported
forever as soon as it is upstream.

Sorry, I didn't understand this. I think we have not change the UAPI.
Which code are you referring to?


Well, what do you need this for if not a new UAPI?

My assumption here is that you need to export the DMA-heap allocation 
function so that you can server an UAPI in your new driver. Or what else 
is that good for?


As far as I understand you try to upstream your new vcodec driver. So 
while this change here seems to be a good idea to clean up existing 
drivers it doesn't look like a good idea for a newly created driver.


Regards,
Christian.


So I think this patch is a little confusing in this series, as I

don't

see much of it actually being used here (though forgive me if I'm
missing it).

Instead, It seems it get used in a separate patch series here:


https://lore.kernel.org/all/20230911125936.10648-1-yunfei.d...@mediatek.com/

Please try to avoid stuff like that it is really confusing and eats
reviewers time.

My fault, I thought dma-buf and media belonged to the different tree,
so I send them separately. The cover letter just said "The consumers of
the new heap and new interface are our codecs and DRM, which will be
sent upstream soon", and there was no vcodec link at that time.

In the next version, we will put the first three patches into the
vcodec patchset.

Thanks.





Re: [PATCH v3 05/13] drm/sched: Split free_job into own work item

2023-09-12 Thread Matthew Brost
On Tue, Sep 12, 2023 at 10:08:33AM +0200, Boris Brezillon wrote:
> On Mon, 11 Sep 2023 19:16:07 -0700
> Matthew Brost  wrote:
> 
> > Rather than call free_job and run_job in same work item have a dedicated
> > work item for each. This aligns with the design and intended use of work
> > queues.
> > 
> > v2:
> >- Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
> >  timestamp in free_job() work item (Danilo)
> > 
> > Signed-off-by: Matthew Brost 
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 143 ++---
> >  include/drm/gpu_scheduler.h|   8 +-
> >  2 files changed, 110 insertions(+), 41 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 3820e9ae12c8..d28b6751256e 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -213,11 +213,12 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq 
> > *rq,
> >   * drm_sched_rq_select_entity_rr - Select an entity which could provide a 
> > job to run
> >   *
> >   * @rq: scheduler run queue to check.
> > + * @dequeue: dequeue selected entity
> >   *
> >   * Try to find a ready entity, returns NULL if none found.
> >   */
> >  static struct drm_sched_entity *
> > -drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> > +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq, bool dequeue)
> >  {
> > struct drm_sched_entity *entity;
> >  
> > @@ -227,8 +228,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> > if (entity) {
> > list_for_each_entry_continue(entity, &rq->entities, list) {
> > if (drm_sched_entity_is_ready(entity)) {
> > -   rq->current_entity = entity;
> > -   reinit_completion(&entity->entity_idle);
> > +   if (dequeue) {
> > +   rq->current_entity = entity;
> > +   reinit_completion(&entity->entity_idle);
> > +   }
> > spin_unlock(&rq->lock);
> > return entity;
> > }
> > @@ -238,8 +241,10 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> > list_for_each_entry(entity, &rq->entities, list) {
> >  
> > if (drm_sched_entity_is_ready(entity)) {
> > -   rq->current_entity = entity;
> > -   reinit_completion(&entity->entity_idle);
> > +   if (dequeue) {
> > +   rq->current_entity = entity;
> > +   reinit_completion(&entity->entity_idle);
> > +   }
> > spin_unlock(&rq->lock);
> > return entity;
> > }
> > @@ -257,11 +262,12 @@ drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
> >   * drm_sched_rq_select_entity_fifo - Select an entity which provides a job 
> > to run
> >   *
> >   * @rq: scheduler run queue to check.
> > + * @dequeue: dequeue selected entity
> >   *
> >   * Find oldest waiting ready entity, returns NULL if none found.
> >   */
> >  static struct drm_sched_entity *
> > -drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> > +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq, bool dequeue)
> >  {
> > struct rb_node *rb;
> >  
> > @@ -271,8 +277,10 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq 
> > *rq)
> >  
> > entity = rb_entry(rb, struct drm_sched_entity, rb_tree_node);
> > if (drm_sched_entity_is_ready(entity)) {
> > -   rq->current_entity = entity;
> > -   reinit_completion(&entity->entity_idle);
> > +   if (dequeue) {
> > +   rq->current_entity = entity;
> > +   reinit_completion(&entity->entity_idle);
> > +   }
> > break;
> > }
> > }
> > @@ -282,13 +290,54 @@ drm_sched_rq_select_entity_fifo(struct drm_sched_rq 
> > *rq)
> >  }
> >  
> >  /**
> > - * drm_sched_submit_queue - scheduler queue submission
> > + * drm_sched_run_job_queue - queue job submission
> >   * @sched: scheduler instance
> >   */
> > -static void drm_sched_submit_queue(struct drm_gpu_scheduler *sched)
> > +static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> >  {
> > if (!READ_ONCE(sched->pause_submit))
> > -   queue_work(sched->submit_wq, &sched->work_submit);
> > +   queue_work(sched->submit_wq, &sched->work_run_job);
> > +}
> > +
> > +static struct drm_sched_entity *
> > +drm_sched_select_entity(struct drm_gpu_scheduler *sched, bool dequeue);
> 
> Nit: Can you drop this forward declaration and move the function here?
>

Sure. Will likely move this function in a seperate patch though.
 
> > +
> > +/**
> > + * drm_sched_run_job_queue_if_ready - queue job submission if ready
> > + * @sch

Re: [PATCH] drm/simpledrm: Add support for multiple "power-domains"

2023-09-12 Thread kernel test robot
Hi Janne,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 15d30b46573d75f5cb58cfacded8ebab9c76a2b0]

url:
https://github.com/intel-lab-lkp/linux/commits/Janne-Grunau-via-B4-Relay/drm-simpledrm-Add-support-for-multiple-power-domains/20230911-004026
base:   15d30b46573d75f5cb58cfacded8ebab9c76a2b0
patch link:
https://lore.kernel.org/r/20230910-simpledrm-multiple-power-domains-v1-1-f8718aefc685%40jannau.net
patch subject: [PATCH] drm/simpledrm: Add support for multiple "power-domains"
config: arm64-randconfig-r003-20230912 
(https://download.01.org/0day-ci/archive/20230912/202309122212.metcn4uk-...@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 
4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20230912/202309122212.metcn4uk-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202309122212.metcn4uk-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/tiny/simpledrm.c:506:24: warning: flag ' ' results in 
>> undefined behavior with 'p' conversion specifier [-Wformat]
 506 | drm_err(&sdev->dev, "% power-domains count:%d\n", __func__, 
sdev->pwr_dom_count);
 |  ~^~
   include/drm/drm_print.h:469:39: note: expanded from macro 'drm_err'
 469 | __drm_printk((drm), err,, "*ERROR* " fmt, ##__VA_ARGS__)
 |  ^~~
   include/drm/drm_print.h:456:41: note: expanded from macro '__drm_printk'
 456 | dev_##level##type((drm)->dev, "[drm] " fmt, ##__VA_ARGS__)
 |^~~
   include/linux/dev_printk.h:144:57: note: expanded from macro 'dev_err'
 144 | dev_printk_index_wrap(_dev_err, KERN_ERR, dev, dev_fmt(fmt), 
##__VA_ARGS__)
 |^~~
   include/linux/dev_printk.h:19:22: note: expanded from macro 'dev_fmt'
  19 | #define dev_fmt(fmt) fmt
 |  ^~~
   include/linux/dev_printk.h:110:16: note: expanded from macro 
'dev_printk_index_wrap'
 110 | _p_func(dev, fmt, ##__VA_ARGS__);
   \
 |  ^~~
   1 warning generated.


vim +506 drivers/gpu/drm/tiny/simpledrm.c

   477  
   478  #if defined CONFIG_OF && defined CONFIG_PM_GENERIC_DOMAINS
   479  /*
   480   * Generic power domain handling code.
   481   *
   482   * Here we handle the power-domains properties of our 
"simple-framebuffer"
   483   * dt node. This is only necessary if there is more than one 
power-domain.
   484   * A single power-domains is handled automatically by the driver core. 
Multiple
   485   * power-domains have to be handled by drivers since the driver core 
can't know
   486   * the correct power sequencing. Power sequencing is not an issue for 
simpledrm
   487   * since the bootloader has put the power domains already in the 
correct state.
   488   * simpledrm has only to ensure they remain active for its lifetime.
   489   *
   490   * When the driver unloads, we detach from the power-domains.
   491   *
   492   * We only complain about errors here, no action is taken as the most 
likely
   493   * error can only happen due to a mismatch between the bootloader which 
set
   494   * up the "simple-framebuffer" dt node, and the PM domain providers in 
the
   495   * device tree. Chances are that there are no adverse effects, and if 
there are,
   496   * a clean teardown of the fb probe will not help us much either. So 
just
   497   * complain and carry on, and hope that the user actually gets a 
working fb at
   498   * the end of things.
   499   */
   500  static void simpledrm_device_detach_genpd(void *res)
   501  {
   502  int i;
   503  struct simpledrm_device *sdev = /*(struct simpledrm_device 
*)*/res;
   504  
   505  
 > 506  drm_err(&sdev->dev, "% power-domains count:%d\n", __func__, 
 > sdev->pwr_dom_count);
   507  if (sdev->pwr_dom_count <= 1)
   508  return;
   509  
   510  for (i = sdev->pwr_dom_count - 1; i >= 0; i--) {
   511  if (!sdev->pwr_dom_links[i])
   512  device_link_del(sdev->pwr_dom_links[i]);
   513  if (!IS_ERR_OR_NULL(sdev->pwr_dom_devs[i]))
   514  dev_pm_domain_detach(sdev->pwr_dom_devs[i], 
true);
   515  }
   516  }
   517  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Danilo Krummrich

On 9/12/23 16:28, Boris Brezillon wrote:

On Thu, 17 Aug 2023 13:13:31 +0200
Danilo Krummrich  wrote:


I think that's a misunderstanding. I'm not trying to say that it is
*always* beneficial to fill up the ring as much as possible. But I think
it is under certain circumstances, exactly those circumstances I
described for Nouveau.

As mentioned, in Nouveau the size of a job is only really limited by the
ring size, which means that one job can (but does not necessarily) fill
up the whole ring. We both agree that this is inefficient, because it
potentially results into the HW run dry due to hw_submission_limit == 1.

I recognize you said that one should define hw_submission_limit and
adjust the other parts of the equation accordingly, the options I see are:

(1) Increase the ring size while keeping the maximum job size.
(2) Decrease the maximum job size while keeping the ring size.
(3) Let the scheduler track the actual job size rather than the maximum
job size.

(1) results into potentially wasted ring memory, because we're not
always reaching the maximum job size, but the scheduler assumes so.

(2) results into more IOCTLs from userspace for the same amount of IBs
and more jobs result into more memory allocations and more work being
submitted to the workqueue (with Matt's patches).

(3) doesn't seem to have any of those draw backs.

What would be your take on that?

Actually, if none of the other drivers is interested into a more precise
way of keeping track of the ring utilization, I'd be totally fine to do
it in a driver specific way. However, unfortunately I don't see how this
would be possible.


I'm not entirely sure, but I think PowerVR is pretty close to your
description: jobs size is dynamic size, and the ring buffer size is
picked by the driver at queue initialization time. What we did was to
set hw_submission_limit to an arbitrarily high value of 64k (we could
have used something like ringbuf_size/min_job_size instead), and then
have the control flow implemented with ->prepare_job() [1] (CCCB is the
PowerVR ring buffer). This allows us to maximize ring buffer utilization
while still allowing dynamic-size jobs.


I guess this would work, but I think it would be better to bake this in,
especially if more drivers do have this need. I already have an
implementation [1] for doing that in the scheduler. My plan was to push
that as soon as Matt sends out V3.

[1] 
https://gitlab.freedesktop.org/nouvelles/kernel/-/commit/269f05d6a2255384badff8b008b3c32d640d2d95





My proposal would be to just keep the hw_submission_limit (maybe rename
it to submission_unit_limit) and add a submission_units field to struct
drm_sched_job. By default a jobs submission_units field would be 0 and
the scheduler would behave the exact same way as it does now.

Accordingly, jobs with submission_units > 1 would contribute more than
one unit to the submission_unit_limit.

What do you think about that?

Besides all that, you said that filling up the ring just enough to not
let the HW run dry rather than filling it up entirely is desirable. Why
do you think so? I tend to think that in most cases it shouldn't make
difference.


[1]https://gitlab.freedesktop.org/frankbinns/powervr/-/blob/powervr-next/drivers/gpu/drm/imagination/pvr_queue.c#L502





Re: [PATCH v2 1/9] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-09-12 Thread Boris Brezillon
On Thu, 17 Aug 2023 13:13:31 +0200
Danilo Krummrich  wrote:

> I think that's a misunderstanding. I'm not trying to say that it is 
> *always* beneficial to fill up the ring as much as possible. But I think 
> it is under certain circumstances, exactly those circumstances I 
> described for Nouveau.
> 
> As mentioned, in Nouveau the size of a job is only really limited by the 
> ring size, which means that one job can (but does not necessarily) fill 
> up the whole ring. We both agree that this is inefficient, because it 
> potentially results into the HW run dry due to hw_submission_limit == 1.
> 
> I recognize you said that one should define hw_submission_limit and 
> adjust the other parts of the equation accordingly, the options I see are:
> 
> (1) Increase the ring size while keeping the maximum job size.
> (2) Decrease the maximum job size while keeping the ring size.
> (3) Let the scheduler track the actual job size rather than the maximum 
> job size.
> 
> (1) results into potentially wasted ring memory, because we're not 
> always reaching the maximum job size, but the scheduler assumes so.
> 
> (2) results into more IOCTLs from userspace for the same amount of IBs 
> and more jobs result into more memory allocations and more work being 
> submitted to the workqueue (with Matt's patches).
> 
> (3) doesn't seem to have any of those draw backs.
> 
> What would be your take on that?
> 
> Actually, if none of the other drivers is interested into a more precise 
> way of keeping track of the ring utilization, I'd be totally fine to do 
> it in a driver specific way. However, unfortunately I don't see how this 
> would be possible.

I'm not entirely sure, but I think PowerVR is pretty close to your
description: jobs size is dynamic size, and the ring buffer size is
picked by the driver at queue initialization time. What we did was to
set hw_submission_limit to an arbitrarily high value of 64k (we could
have used something like ringbuf_size/min_job_size instead), and then
have the control flow implemented with ->prepare_job() [1] (CCCB is the
PowerVR ring buffer). This allows us to maximize ring buffer utilization
while still allowing dynamic-size jobs.

> 
> My proposal would be to just keep the hw_submission_limit (maybe rename 
> it to submission_unit_limit) and add a submission_units field to struct 
> drm_sched_job. By default a jobs submission_units field would be 0 and 
> the scheduler would behave the exact same way as it does now.
> 
> Accordingly, jobs with submission_units > 1 would contribute more than 
> one unit to the submission_unit_limit.
> 
> What do you think about that?
> 
> Besides all that, you said that filling up the ring just enough to not 
> let the HW run dry rather than filling it up entirely is desirable. Why 
> do you think so? I tend to think that in most cases it shouldn't make 
> difference.

[1]https://gitlab.freedesktop.org/frankbinns/powervr/-/blob/powervr-next/drivers/gpu/drm/imagination/pvr_queue.c#L502


Re: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / entity

2023-09-12 Thread kernel test robot
Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on drm-exynos/exynos-drm-next 
drm-intel/for-linux-next-fixes drm-tip/drm-tip linus/master v6.6-rc1 
next-20230912]
[cannot apply to drm-misc/drm-misc-next drm-intel/for-linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Matthew-Brost/drm-sched-Add-drm_sched_submit_-helpers/20230912-102001
base:   git://anongit.freedesktop.org/drm/drm drm-next
patch link:
https://lore.kernel.org/r/20230912021615.2086698-4-matthew.brost%40intel.com
patch subject: [PATCH v3 03/13] drm/sched: Move schedule policy to scheduler / 
entity
config: arm64-randconfig-r032-20230912 
(https://download.01.org/0day-ci/archive/20230912/202309122100.haei8ytj-...@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git 
ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20230912/202309122100.haei8ytj-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202309122100.haei8ytj-...@intel.com/

All errors (new ones prefixed by >>):

>> drivers/gpu/drm/v3d/v3d_sched.c:403:9: error: use of undeclared identifier 
>> 'ULL'
ULL, "v3d_render", DRM_SCHED_POLICY_DEFAULT,
^
   1 error generated.


vim +/ULL +403 drivers/gpu/drm/v3d/v3d_sched.c

   381  
   382  int
   383  v3d_sched_init(struct v3d_dev *v3d)
   384  {
   385  int hw_jobs_limit = 1;
   386  int job_hang_limit = 0;
   387  int hang_limit_ms = 500;
   388  int ret;
   389  
   390  ret = drm_sched_init(&v3d->queue[V3D_BIN].sched,
   391   &v3d_bin_sched_ops, NULL,
   392   hw_jobs_limit, job_hang_limit,
   393   msecs_to_jiffies(hang_limit_ms), NULL,
   394   NULL, "v3d_bin", DRM_SCHED_POLICY_DEFAULT,
   395   v3d->drm.dev);
   396  if (ret)
   397  return ret;
   398  
   399  ret = drm_sched_init(&v3d->queue[V3D_RENDER].sched,
   400   &v3d_render_sched_ops, NULL,
   401   hw_jobs_limit, job_hang_limit,
   402   msecs_to_jiffies(hang_limit_ms), NULL,
 > 403   ULL, "v3d_render", 
 > DRM_SCHED_POLICY_DEFAULT,
   404   v3d->drm.dev);
   405  if (ret)
   406  goto fail;
   407  
   408  ret = drm_sched_init(&v3d->queue[V3D_TFU].sched,
   409   &v3d_tfu_sched_ops, NULL,
   410   hw_jobs_limit, job_hang_limit,
   411   msecs_to_jiffies(hang_limit_ms), NULL,
   412   NULL, "v3d_tfu", DRM_SCHED_POLICY_DEFAULT,
   413   v3d->drm.dev);
   414  if (ret)
   415  goto fail;
   416  
   417  if (v3d_has_csd(v3d)) {
   418  ret = drm_sched_init(&v3d->queue[V3D_CSD].sched,
   419   &v3d_csd_sched_ops, NULL,
   420   hw_jobs_limit, job_hang_limit,
   421   msecs_to_jiffies(hang_limit_ms), 
NULL,
   422   NULL, "v3d_csd", 
DRM_SCHED_POLICY_DEFAULT,
   423   v3d->drm.dev);
   424  if (ret)
   425  goto fail;
   426  
   427  ret = drm_sched_init(&v3d->queue[V3D_CACHE_CLEAN].sched,
   428   &v3d_cache_clean_sched_ops, NULL,
   429   hw_jobs_limit, job_hang_limit,
   430   msecs_to_jiffies(hang_limit_ms), 
NULL,
   431   NULL, "v3d_cache_clean",
   432   DRM_SCHED_POLICY_DEFAULT, 
v3d->drm.dev);
   433  if (ret)
   434  goto fail;
   435  }
   436  
   437  return 0;
   438  
   439  fail:
   440  v3d_sched_fini(v3d);
   441  return ret;
   442  }
   443  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Danilo Krummrich
On Tue, Sep 12, 2023 at 03:52:28PM +0200, Boris Brezillon wrote:
> On Tue, 12 Sep 2023 14:56:06 +0200
> Danilo Krummrich  wrote:
> 
> > On Tue, Sep 12, 2023 at 02:18:18PM +0200, Boris Brezillon wrote:
> > > On Tue, 12 Sep 2023 12:46:26 +0200
> > > Danilo Krummrich  wrote:
> > >   
> > > > > I'm a bit worried that leaving this single vs multi-threaded wq
> > > > > decision to drivers is going to cause unnecessary pain, because what
> > > > > was previously a granted in term of run/cleanup execution order 
> > > > > (thanks
> > > > > to the kthread+static-drm_sched_main-workflow approach) is now subject
> > > > > to the wq ordering guarantees, which depend on the wq type picked by
> > > > > the driver.
> > > > 
> > > > Not sure if this ends up to be much different. The only thing I could 
> > > > think of
> > > > is that IIRC with the kthread implementation cleanup was always 
> > > > preferred over
> > > > run.  
> > > 
> > > Given the sequence in drm_sched_main(), I'd say that cleanup and run
> > > operations are naturally interleaved when both are available, but I
> > > might be wrong.  
> > 
> > From drm_sched_main():
> > 
> > wait_event_interruptible(sched->wake_up_worker,
> >  (cleanup_job = 
> > drm_sched_get_cleanup_job(sched)) ||
> >  (!drm_sched_blocked(sched) &&
> >   (entity = drm_sched_select_entity(sched))) ||
> >  kthread_should_stop());
> > 
> > if (cleanup_job)
> > sched->ops->free_job(cleanup_job);
> > 
> > if (!entity)
> > continue;
> > 
> > If cleanup_job is not NULL the rest shouldn't be evaluated I guess. Hence 
> > entity
> > would be NULL and we'd loop until there are no more cleanup_jobs if I don't 
> > miss
> > anything here.
> 
> Indeed, I got tricked by the wait_event() expression.
> 
> > 
> > >   
> > > > With a single threaded wq this should be a bit more balanced.  
> > > 
> > > With a single threaded wq, it's less clear, because each work
> > > reschedules itself for further processing, but it's likely to be more
> > > or less interleaved. Anyway, I'm not too worried about cleanup taking
> > > precedence on run or the other way around, because the limited amount
> > > of HW slots (size of the ring-buffer) will regulate that.  
> > 
> > Yeah, that's what I meant, with to work items rescheduling themselves it 
> > starts
> > to be interleaved. Which I'm not worried about as well.
> > 
> > >   
> > > > 
> > > > With a multi-threaded wq it's still the same, but run and cleanup can 
> > > > run
> > > > concurrently,  
> > > 
> > > What I'm worried about is that ^. I'm not saying it's fundamentally
> > > unsafe, but I'm saying drm_sched hasn't been designed with this
> > > concurrency in mind, and I fear we'll face subtle bugs if we go from
> > > kthread to multi-threaded-wq+run-and-cleanup-split-in-2-work-items.
> > >   
> > 
> > Yeah, so what we get with that is that job_run() of job A and job_free() of 
> > job
> > B can run in parallel. Unless drivers do weird things there, I'm not seeing 
> > an
> > issue with that as well at a first glance.
> 
> I might be wrong of course, but I'm pretty sure the timestamp race you
> reported is indirectly coming from this ST -> MT transition. Again, I'm
> not saying we should never use an MT wq, but it feels a bit premature,
> and I think I'd prefer if we do it in 2 steps to minimize the amount of
> things that could go wrong, and avoid a late revert.

Indirectly, yes. I would agree with using an internal single threaded workqueue
by default although I'm a bit more optimistic about that. Howver, I'd still like
the driver to choose. Otherwise, in Nouveau I'd need to keep queueing work in
free_job() to another workqueue, which isn't very nice.

> 
> > 
> > > > which has the nice side effect that free_job() gets out of the
> > > > fence signaling path. At least as long as the workqueue has max_active 
> > > > > 1.  
> > > 
> > > Oh, yeah, I don't deny using a multi-threaded workqueue has some
> > > benefits, just saying it might be trickier than it sounds.
> > >   
> > > > Which is one reason why I'm using a multi-threaded wq in Nouveau.  
> > > 
> > > Note that I'm using a multi-threaded workqueue internally at the moment
> > > to deal with all sort of interactions with the FW (Mali HW only has a
> > > limited amount of scheduling slots, and we need to rotate entities
> > > having jobs to execute so every one gets a chance to run on the GPU),
> > > but this has been designed this way from the ground up, unlike
> > > drm_sched_main() operations, which were mostly thought as a fixed
> > > sequential set of operations. That's not to say it's impossible to get
> > > right, but I fear we'll face weird/unexpected behavior if we go from
> > > completely-serialized to multi-threaded-with-pseudo-random-processing
> > > order.  
> > 
> > From a per job perspective it's still all sequential and besides fen

Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 15:34:41 +0200
Danilo Krummrich  wrote:

> On Tue, Sep 12, 2023 at 03:27:05PM +0200, Boris Brezillon wrote:
> > On Fri, 25 Aug 2023 15:45:49 +0200
> > Christian König  wrote:
> >   
> > >  I tried this patch with Nouveau and found a race condition:
> > > 
> > >  In drm_sched_run_job_work() the job is added to the pending_list via
> > >  drm_sched_job_begin(), then the run_job() callback is called and the 
> > >  scheduled
> > >  fence is signaled.
> > > 
> > >  However, in parallel drm_sched_get_cleanup_job() might be called from
> > >  drm_sched_free_job_work(), which picks the first job from the 
> > >  pending_list and
> > >  for the next job on the pending_list sets the scheduled fence' 
> > >  timestamp field.
> > > >> Well why can this happen in parallel? Either the work items are 
> > > >> scheduled to
> > > >> a single threaded work queue or you have protected the pending list 
> > > >> with
> > > >> some locks.
> > > >>
> > > > Xe uses a single-threaded work queue, Nouveau does not (desired
> > > > behavior).
> > > >
> > > > The list of pending jobs is protected by a lock (safe), the race is:
> > > >
> > > > add job to pending list
> > > > run_job
> > > > signal scheduled fence
> > > >
> > > > dequeue from pending list
> > > > free_job
> > > > update timestamp
> > > >
> > > > Once a job is on the pending list its timestamp can be accessed which
> > > > can blow up if scheduled fence isn't signaled or more specifically 
> > > > unless
> > > > DMA_FENCE_FLAG_TIMESTAMP_BIT is set.
> > 
> > I'm a bit lost. How can this lead to a NULL deref? Timestamp is a
> > ktime_t embedded in dma_fence, and finished/scheduled are both
> > dma_fence objects embedded in drm_sched_fence. So, unless
> > {job,next_job}->s_fence is NULL, or {job,next_job} itself is NULL, I
> > don't really see where the NULL deref is. If s_fence is NULL, that means
> > drm_sched_job_init() wasn't called (unlikely to be detected that late),
> > or ->free_job()/drm_sched_job_cleanup() was called while the job was
> > still in the pending list. I don't really see a situation where job
> > could NULL to be honest.  
> 
> I think the problem here was that a dma_fence' timestamp field is within a 
> union
> together with it's cb_list list_head [1]. If a timestamp is set before the 
> fence
> is actually signalled, dma_fence_signal_timestamp_locked() will access the
> cb_list to run the particular callbacks registered to this dma_fence. However,
> writing the timestap will overwrite this list_head since it's a union, hence
> we'd try to dereference the timestamp while iterating the list.

Ah, right. I didn't notice it was a union, thought it was a struct...

> 
> [1] 
> https://elixir.bootlin.com/linux/latest/source/include/linux/dma-fence.h#L87
> 
> > 
> > While I agree that updating the timestamp before the fence has been
> > flagged as signaled/timestamped is broken (timestamp will be
> > overwritten when dma_fence_signal(scheduled) is called) I don't see a
> > situation where it would cause a NULL/invalid pointer deref. So I
> > suspect there's another race causing jobs to be cleaned up while
> > they're still in the pending_list.
> >   
> > > 
> > > Ah, that problem again. No that is actually quite harmless.
> > > 
> > > You just need to double check if the DMA_FENCE_FLAG_TIMESTAMP_BIT is 
> > > already set and if it's not set don't do anything.  
> >   
> 



Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 14:56:06 +0200
Danilo Krummrich  wrote:

> On Tue, Sep 12, 2023 at 02:18:18PM +0200, Boris Brezillon wrote:
> > On Tue, 12 Sep 2023 12:46:26 +0200
> > Danilo Krummrich  wrote:
> >   
> > > > I'm a bit worried that leaving this single vs multi-threaded wq
> > > > decision to drivers is going to cause unnecessary pain, because what
> > > > was previously a granted in term of run/cleanup execution order (thanks
> > > > to the kthread+static-drm_sched_main-workflow approach) is now subject
> > > > to the wq ordering guarantees, which depend on the wq type picked by
> > > > the driver.
> > > 
> > > Not sure if this ends up to be much different. The only thing I could 
> > > think of
> > > is that IIRC with the kthread implementation cleanup was always preferred 
> > > over
> > > run.  
> > 
> > Given the sequence in drm_sched_main(), I'd say that cleanup and run
> > operations are naturally interleaved when both are available, but I
> > might be wrong.  
> 
> From drm_sched_main():
> 
>   wait_event_interruptible(sched->wake_up_worker,
>(cleanup_job = 
> drm_sched_get_cleanup_job(sched)) ||
>(!drm_sched_blocked(sched) &&
> (entity = drm_sched_select_entity(sched))) ||
>kthread_should_stop());
> 
>   if (cleanup_job)
>   sched->ops->free_job(cleanup_job);
> 
>   if (!entity)
>   continue;
> 
> If cleanup_job is not NULL the rest shouldn't be evaluated I guess. Hence 
> entity
> would be NULL and we'd loop until there are no more cleanup_jobs if I don't 
> miss
> anything here.

Indeed, I got tricked by the wait_event() expression.

> 
> >   
> > > With a single threaded wq this should be a bit more balanced.  
> > 
> > With a single threaded wq, it's less clear, because each work
> > reschedules itself for further processing, but it's likely to be more
> > or less interleaved. Anyway, I'm not too worried about cleanup taking
> > precedence on run or the other way around, because the limited amount
> > of HW slots (size of the ring-buffer) will regulate that.  
> 
> Yeah, that's what I meant, with to work items rescheduling themselves it 
> starts
> to be interleaved. Which I'm not worried about as well.
> 
> >   
> > > 
> > > With a multi-threaded wq it's still the same, but run and cleanup can run
> > > concurrently,  
> > 
> > What I'm worried about is that ^. I'm not saying it's fundamentally
> > unsafe, but I'm saying drm_sched hasn't been designed with this
> > concurrency in mind, and I fear we'll face subtle bugs if we go from
> > kthread to multi-threaded-wq+run-and-cleanup-split-in-2-work-items.
> >   
> 
> Yeah, so what we get with that is that job_run() of job A and job_free() of 
> job
> B can run in parallel. Unless drivers do weird things there, I'm not seeing an
> issue with that as well at a first glance.

I might be wrong of course, but I'm pretty sure the timestamp race you
reported is indirectly coming from this ST -> MT transition. Again, I'm
not saying we should never use an MT wq, but it feels a bit premature,
and I think I'd prefer if we do it in 2 steps to minimize the amount of
things that could go wrong, and avoid a late revert.

> 
> > > which has the nice side effect that free_job() gets out of the
> > > fence signaling path. At least as long as the workqueue has max_active > 
> > > 1.  
> > 
> > Oh, yeah, I don't deny using a multi-threaded workqueue has some
> > benefits, just saying it might be trickier than it sounds.
> >   
> > > Which is one reason why I'm using a multi-threaded wq in Nouveau.  
> > 
> > Note that I'm using a multi-threaded workqueue internally at the moment
> > to deal with all sort of interactions with the FW (Mali HW only has a
> > limited amount of scheduling slots, and we need to rotate entities
> > having jobs to execute so every one gets a chance to run on the GPU),
> > but this has been designed this way from the ground up, unlike
> > drm_sched_main() operations, which were mostly thought as a fixed
> > sequential set of operations. That's not to say it's impossible to get
> > right, but I fear we'll face weird/unexpected behavior if we go from
> > completely-serialized to multi-threaded-with-pseudo-random-processing
> > order.  
> 
> From a per job perspective it's still all sequential and besides fence
> dependencies,

Sure, per job ops are still sequential (run, then cleanup once parent
fence is signalled).

> which are still resolved, I don't see where jobs could have cross
> dependencies that make this racy. But agree that it's probably worth to think
> through it a bit more.
> 
> >   
> > > 
> > > That latter seems a bit subtile, we probably need to document this aspect 
> > > of
> > > under which conditions free_job() is or is not within the fence signaling 
> > > path.  
> > 
> > Well, I'm not even sure it can be clearly defined when the driver is
> > using the 

[PATCH v4 2/5] fbdev: Replace fb_pgprotect() with pgprot_framebuffer()

2023-09-12 Thread Thomas Zimmermann
Rename the fbdev mmap helper fb_pgprotect() to pgprot_framebuffer().
The helper sets VMA page-access flags for framebuffers in device I/O
memory.

Also clean up the helper's parameters and return value. Instead of
the VMA instance, pass the individial parameters separately: existing
page-access flags, the VMAs start and end addresses and the offset
in the underlying device memory rsp file. Return the new page-access
flags. These changes align pgprot_framebuffer() with other pgprot_()
functions.

v4:
* fix commit message (Christophe)
v3:
* rename fb_pgprotect() to pgprot_framebuffer() (Arnd)

Signed-off-by: Thomas Zimmermann 
---
 arch/ia64/include/asm/fb.h   | 15 +++
 arch/m68k/include/asm/fb.h   | 19 ++-
 arch/mips/include/asm/fb.h   | 11 +--
 arch/powerpc/include/asm/fb.h| 13 +
 arch/sparc/include/asm/fb.h  | 15 +--
 arch/x86/include/asm/fb.h| 10 ++
 arch/x86/video/fbdev.c   | 15 ---
 drivers/video/fbdev/core/fb_chrdev.c |  3 ++-
 include/asm-generic/fb.h | 12 ++--
 9 files changed, 58 insertions(+), 55 deletions(-)

diff --git a/arch/ia64/include/asm/fb.h b/arch/ia64/include/asm/fb.h
index 1717b26fd423f..7fce0d5423590 100644
--- a/arch/ia64/include/asm/fb.h
+++ b/arch/ia64/include/asm/fb.h
@@ -8,17 +8,16 @@
 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
+ unsigned long vm_start, unsigned long 
vm_end,
+ unsigned long offset)
 {
-   if (efi_range_is_wc(vma->vm_start, vma->vm_end - vma->vm_start))
-   vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+   if (efi_range_is_wc(vm_start, vm_end - vm_start))
+   return pgprot_writecombine(prot);
else
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   return pgprot_noncached(prot);
 }
-#define fb_pgprotect fb_pgprotect
+#define pgprot_framebuffer pgprot_framebuffer
 
 static inline void fb_memcpy_fromio(void *to, const volatile void __iomem 
*from, size_t n)
 {
diff --git a/arch/m68k/include/asm/fb.h b/arch/m68k/include/asm/fb.h
index 24273fc7ad917..9941b7434b696 100644
--- a/arch/m68k/include/asm/fb.h
+++ b/arch/m68k/include/asm/fb.h
@@ -5,26 +5,27 @@
 #include 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
+ unsigned long vm_start, unsigned long 
vm_end,
+ unsigned long offset)
 {
 #ifdef CONFIG_MMU
 #ifdef CONFIG_SUN3
-   pgprot_val(vma->vm_page_prot) |= SUN3_PAGE_NOCACHE;
+   pgprot_val(prot) |= SUN3_PAGE_NOCACHE;
 #else
if (CPU_IS_020_OR_030)
-   pgprot_val(vma->vm_page_prot) |= _PAGE_NOCACHE030;
+   pgprot_val(prot) |= _PAGE_NOCACHE030;
if (CPU_IS_040_OR_060) {
-   pgprot_val(vma->vm_page_prot) &= _CACHEMASK040;
+   pgprot_val(prot) &= _CACHEMASK040;
/* Use no-cache mode, serialized */
-   pgprot_val(vma->vm_page_prot) |= _PAGE_NOCACHE_S;
+   pgprot_val(prot) |= _PAGE_NOCACHE_S;
}
 #endif /* CONFIG_SUN3 */
 #endif /* CONFIG_MMU */
+
+   return prot;
 }
-#define fb_pgprotect fb_pgprotect
+#define pgprot_framebuffer pgprot_framebuffer
 
 #include 
 
diff --git a/arch/mips/include/asm/fb.h b/arch/mips/include/asm/fb.h
index 18b7226403bad..d98d6681d64ec 100644
--- a/arch/mips/include/asm/fb.h
+++ b/arch/mips/include/asm/fb.h
@@ -3,14 +3,13 @@
 
 #include 
 
-struct file;
-
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
+ unsigned long vm_start, unsigned long 
vm_end,
+ unsigned long offset)
 {
-   vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+   return pgprot_noncached(prot);
 }
-#define fb_pgprotect fb_pgprotect
+#define pgprot_framebuffer pgprot_framebuffer
 
 /*
  * MIPS doesn't define __raw_ I/O macros, so the helpers
diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 61e3b8806db69..3cecf14d51de8 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -2,23 +2,20 @@
 #ifndef _ASM_FB_H_
 #define _ASM_FB_H_
 
-#include 
-
 #include 
 
-static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
-   unsigned long off)
+static inline pgprot_

[PATCH v4 3/5] arch/powerpc: Remove trailing whitespaces

2023-09-12 Thread Thomas Zimmermann
Fix coding style. No functional changes.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/machdep.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 4f6e7d7ee3883..933465ed4c432 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -10,7 +10,7 @@
 #include 
 
 struct pt_regs;
-struct pci_bus;
+struct pci_bus;
 struct device_node;
 struct iommu_table;
 struct rtc_time;
@@ -78,8 +78,8 @@ struct machdep_calls {
unsigned char   (*nvram_read_val)(int addr);
void(*nvram_write_val)(int addr, unsigned char val);
ssize_t (*nvram_write)(char *buf, size_t count, loff_t *index);
-   ssize_t (*nvram_read)(char *buf, size_t count, loff_t *index);  
-   ssize_t (*nvram_size)(void);
+   ssize_t (*nvram_read)(char *buf, size_t count, loff_t *index);
+   ssize_t (*nvram_size)(void);
void(*nvram_sync)(void);
 
/* Exception handlers */
@@ -102,9 +102,9 @@ struct machdep_calls {
 */
long(*feature_call)(unsigned int feature, ...);
 
-   /* Get legacy PCI/IDE interrupt mapping */ 
+   /* Get legacy PCI/IDE interrupt mapping */
int (*pci_get_legacy_ide_irq)(struct pci_dev *dev, int 
channel);
-   
+
/* Get access protection for /dev/mem */
pgprot_t(*phys_mem_access_prot)(struct file *file,
unsigned long pfn,
-- 
2.42.0



[PATCH v4 5/5] arch/powerpc: Call internal __phys_mem_access_prot() in fbdev code

2023-09-12 Thread Thomas Zimmermann
Call __phys_mem_access_prot() from the fbdev mmap helper
pgprot_framebuffer(). Allows to avoid the file argument of NULL.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/fb.h | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 3cecf14d51de8..c0c5d1df7ad1e 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -8,12 +8,7 @@ static inline pgprot_t pgprot_framebuffer(pgprot_t prot,
  unsigned long vm_start, unsigned long 
vm_end,
  unsigned long offset)
 {
-   /*
-* PowerPC's implementation of phys_mem_access_prot() does
-* not use the file argument. Set it to NULL in preparation
-* of later updates to the interface.
-*/
-   return phys_mem_access_prot(NULL, PHYS_PFN(offset), vm_end - vm_start, 
prot);
+   return __phys_mem_access_prot(PHYS_PFN(offset), vm_end - vm_start, 
prot);
 }
 #define pgprot_framebuffer pgprot_framebuffer
 
-- 
2.42.0



[PATCH v4 1/5] fbdev: Avoid file argument in fb_pgprotect()

2023-09-12 Thread Thomas Zimmermann
Only PowerPC's fb_pgprotect() needs the file argument, although
the implementation does not use it. Pass NULL to the internal
helper in preparation of further updates. A later patch will remove
the file parameter from fb_pgprotect().

While at it, replace the shift operation with PHYS_PFN().

Suggested-by: Christophe Leroy 
Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/fb.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/fb.h b/arch/powerpc/include/asm/fb.h
index 5f1a2e5f76548..61e3b8806db69 100644
--- a/arch/powerpc/include/asm/fb.h
+++ b/arch/powerpc/include/asm/fb.h
@@ -9,7 +9,12 @@
 static inline void fb_pgprotect(struct file *file, struct vm_area_struct *vma,
unsigned long off)
 {
-   vma->vm_page_prot = phys_mem_access_prot(file, off >> PAGE_SHIFT,
+   /*
+* PowerPC's implementation of phys_mem_access_prot() does
+* not use the file argument. Set it to NULL in preparation
+* of later updates to the interface.
+*/
+   vma->vm_page_prot = phys_mem_access_prot(NULL, PHYS_PFN(off),
 vma->vm_end - vma->vm_start,
 vma->vm_page_prot);
 }
-- 
2.42.0



[PATCH v4 4/5] arch/powerpc: Remove file parameter from phys_mem_access_prot code

2023-09-12 Thread Thomas Zimmermann
Remove 'file' parameter from struct machdep_calls.phys_mem_access_prot
and its implementation in pci_phys_mem_access_prot(). The file is not
used on PowerPC. By removing it, a later patch can simplify fbdev's
mmap code, which uses phys_mem_access_prot() on PowerPC.

Signed-off-by: Thomas Zimmermann 
---
 arch/powerpc/include/asm/book3s/pgtable.h | 10 --
 arch/powerpc/include/asm/machdep.h|  3 +--
 arch/powerpc/include/asm/nohash/pgtable.h | 10 --
 arch/powerpc/include/asm/pci.h|  4 +---
 arch/powerpc/kernel/pci-common.c  |  3 +--
 arch/powerpc/mm/mem.c |  8 
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/pgtable.h 
b/arch/powerpc/include/asm/book3s/pgtable.h
index d18b748ea3ae0..84e36a5726417 100644
--- a/arch/powerpc/include/asm/book3s/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/pgtable.h
@@ -20,9 +20,15 @@ extern void set_pte_at(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep,
 extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long 
address,
 pte_t *ptep, pte_t entry, int dirty);
 
+extern pgprot_t __phys_mem_access_prot(unsigned long pfn, unsigned long size,
+  pgprot_t vma_prot);
+
 struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-unsigned long size, pgprot_t vma_prot);
+static inline pgprot_t phys_mem_access_prot(struct file *file, unsigned long 
pfn,
+   unsigned long size, pgprot_t 
vma_prot)
+{
+   return __phys_mem_access_prot(pfn, size, vma_prot);
+}
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
 void __update_mmu_cache(struct vm_area_struct *vma, unsigned long address, 
pte_t *ptep);
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 933465ed4c432..d31a5ec1550d4 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -106,8 +106,7 @@ struct machdep_calls {
int (*pci_get_legacy_ide_irq)(struct pci_dev *dev, int 
channel);
 
/* Get access protection for /dev/mem */
-   pgprot_t(*phys_mem_access_prot)(struct file *file,
-   unsigned long pfn,
+   pgprot_t(*phys_mem_access_prot)(unsigned long pfn,
unsigned long size,
pgprot_t vma_prot);
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index a6cb6f922..90366b0b3ad9a 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -246,9 +246,15 @@ extern int ptep_set_access_flags(struct vm_area_struct 
*vma, unsigned long addre
 
 #define pgprot_writecombine pgprot_noncached_wc
 
+extern pgprot_t __phys_mem_access_prot(unsigned long pfn, unsigned long size,
+  pgprot_t vma_prot);
+
 struct file;
-extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
-unsigned long size, pgprot_t vma_prot);
+static inline pgprot_t phys_mem_access_prot(struct file *file, unsigned long 
pfn,
+   unsigned long size, pgprot_t 
vma_prot)
+{
+   return __phys_mem_access_prot(pfn, size, vma_prot);
+}
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 
 #ifdef CONFIG_HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/asm/pci.h
index 289f1ec85bc54..34ed4d51c546b 100644
--- a/arch/powerpc/include/asm/pci.h
+++ b/arch/powerpc/include/asm/pci.h
@@ -104,9 +104,7 @@ extern void of_scan_pci_bridge(struct pci_dev *dev);
 extern void of_scan_bus(struct device_node *node, struct pci_bus *bus);
 extern void of_rescan_bus(struct device_node *node, struct pci_bus *bus);
 
-struct file;
-extern pgprot_tpci_phys_mem_access_prot(struct file *file,
-unsigned long pfn,
+extern pgprot_tpci_phys_mem_access_prot(unsigned long pfn,
 unsigned long size,
 pgprot_t prot);
 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index e88d7c9feeec3..73f12a17e572e 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -521,8 +521,7 @@ int pci_iobar_pfn(struct pci_dev *pdev, int bar, struct 
vm_area_struct *vma)
  * PCI device, it tries to find the PCI device first and calls the
  * above routine
  */
-pgprot_t pci_phys_mem_access_prot(struct file *file,
- unsigned long pfn,
+pgprot_t pci_phys_mem_access_prot(unsigned long pfn,
  unsigned long size,
  pgprot_t prot)
 {
diff --gi

[PATCH v4 0/5] ppc, fbdev: Clean up fbdev mmap helper

2023-09-12 Thread Thomas Zimmermann
Clean up and rename fb_pgprotect() to work without struct file. Then
refactor the implementation for PowerPC. This change has been discussed
at [1] in the context of refactoring fbdev's mmap code.

The first two patches update fbdev and replace fbdev's fb_pgprotect()
with pgprot_framebuffer() on all architectures. The new helper's stream-
lined interface enables more refactoring within fbdev's mmap
implementation.

Patches 3 to 5 adapt PowerPC's internal interfaces to provide
phys_mem_access_prot() that works without struct file. Neither the
architecture code or fbdev helpers need the parameter.

v4:
* fix commit message (Christophe)
v3:
* rename fb_pgrotect() to pgprot_framebuffer() (Arnd)
v2:
* reorder patches to simplify merging (Michael)

[1] 
https://lore.kernel.org/linuxppc-dev/5501ba80-bdb0-6344-16b0-0466a950f...@suse.com/

Thomas Zimmermann (5):
  fbdev: Avoid file argument in fb_pgprotect()
  fbdev: Replace fb_pgprotect() with pgprot_framebuffer()
  arch/powerpc: Remove trailing whitespaces
  arch/powerpc: Remove file parameter from phys_mem_access_prot code
  arch/powerpc: Call internal __phys_mem_access_prot() in fbdev code

 arch/ia64/include/asm/fb.h| 15 +++
 arch/m68k/include/asm/fb.h| 19 ++-
 arch/mips/include/asm/fb.h| 11 +--
 arch/powerpc/include/asm/book3s/pgtable.h | 10 --
 arch/powerpc/include/asm/fb.h | 13 +
 arch/powerpc/include/asm/machdep.h| 13 ++---
 arch/powerpc/include/asm/nohash/pgtable.h | 10 --
 arch/powerpc/include/asm/pci.h|  4 +---
 arch/powerpc/kernel/pci-common.c  |  3 +--
 arch/powerpc/mm/mem.c |  8 
 arch/sparc/include/asm/fb.h   | 15 +--
 arch/x86/include/asm/fb.h | 10 ++
 arch/x86/video/fbdev.c| 15 ---
 drivers/video/fbdev/core/fb_chrdev.c  |  3 ++-
 include/asm-generic/fb.h  | 12 ++--
 15 files changed, 86 insertions(+), 75 deletions(-)

-- 
2.42.0



Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Danilo Krummrich
On Tue, Sep 12, 2023 at 03:27:05PM +0200, Boris Brezillon wrote:
> On Fri, 25 Aug 2023 15:45:49 +0200
> Christian König  wrote:
> 
> >  I tried this patch with Nouveau and found a race condition:
> > 
> >  In drm_sched_run_job_work() the job is added to the pending_list via
> >  drm_sched_job_begin(), then the run_job() callback is called and the 
> >  scheduled
> >  fence is signaled.
> > 
> >  However, in parallel drm_sched_get_cleanup_job() might be called from
> >  drm_sched_free_job_work(), which picks the first job from the 
> >  pending_list and
> >  for the next job on the pending_list sets the scheduled fence' 
> >  timestamp field.  
> > >> Well why can this happen in parallel? Either the work items are 
> > >> scheduled to
> > >> a single threaded work queue or you have protected the pending list with
> > >> some locks.
> > >>  
> > > Xe uses a single-threaded work queue, Nouveau does not (desired
> > > behavior).
> > >
> > > The list of pending jobs is protected by a lock (safe), the race is:
> > >
> > > add job to pending list
> > > run_job
> > > signal scheduled fence
> > >
> > > dequeue from pending list
> > > free_job
> > > update timestamp
> > >
> > > Once a job is on the pending list its timestamp can be accessed which
> > > can blow up if scheduled fence isn't signaled or more specifically unless
> > > DMA_FENCE_FLAG_TIMESTAMP_BIT is set.  
> 
> I'm a bit lost. How can this lead to a NULL deref? Timestamp is a
> ktime_t embedded in dma_fence, and finished/scheduled are both
> dma_fence objects embedded in drm_sched_fence. So, unless
> {job,next_job}->s_fence is NULL, or {job,next_job} itself is NULL, I
> don't really see where the NULL deref is. If s_fence is NULL, that means
> drm_sched_job_init() wasn't called (unlikely to be detected that late),
> or ->free_job()/drm_sched_job_cleanup() was called while the job was
> still in the pending list. I don't really see a situation where job
> could NULL to be honest.

I think the problem here was that a dma_fence' timestamp field is within a union
together with it's cb_list list_head [1]. If a timestamp is set before the fence
is actually signalled, dma_fence_signal_timestamp_locked() will access the
cb_list to run the particular callbacks registered to this dma_fence. However,
writing the timestap will overwrite this list_head since it's a union, hence
we'd try to dereference the timestamp while iterating the list.

[1] https://elixir.bootlin.com/linux/latest/source/include/linux/dma-fence.h#L87

> 
> While I agree that updating the timestamp before the fence has been
> flagged as signaled/timestamped is broken (timestamp will be
> overwritten when dma_fence_signal(scheduled) is called) I don't see a
> situation where it would cause a NULL/invalid pointer deref. So I
> suspect there's another race causing jobs to be cleaned up while
> they're still in the pending_list.
> 
> > 
> > Ah, that problem again. No that is actually quite harmless.
> > 
> > You just need to double check if the DMA_FENCE_FLAG_TIMESTAMP_BIT is 
> > already set and if it's not set don't do anything.
> 



Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Boris Brezillon
On Fri, 25 Aug 2023 15:45:49 +0200
Christian König  wrote:

>  I tried this patch with Nouveau and found a race condition:
> 
>  In drm_sched_run_job_work() the job is added to the pending_list via
>  drm_sched_job_begin(), then the run_job() callback is called and the 
>  scheduled
>  fence is signaled.
> 
>  However, in parallel drm_sched_get_cleanup_job() might be called from
>  drm_sched_free_job_work(), which picks the first job from the 
>  pending_list and
>  for the next job on the pending_list sets the scheduled fence' timestamp 
>  field.  
> >> Well why can this happen in parallel? Either the work items are scheduled 
> >> to
> >> a single threaded work queue or you have protected the pending list with
> >> some locks.
> >>  
> > Xe uses a single-threaded work queue, Nouveau does not (desired
> > behavior).
> >
> > The list of pending jobs is protected by a lock (safe), the race is:
> >
> > add job to pending list
> > run_job
> > signal scheduled fence
> >
> > dequeue from pending list
> > free_job
> > update timestamp
> >
> > Once a job is on the pending list its timestamp can be accessed which
> > can blow up if scheduled fence isn't signaled or more specifically unless
> > DMA_FENCE_FLAG_TIMESTAMP_BIT is set.  

I'm a bit lost. How can this lead to a NULL deref? Timestamp is a
ktime_t embedded in dma_fence, and finished/scheduled are both
dma_fence objects embedded in drm_sched_fence. So, unless
{job,next_job}->s_fence is NULL, or {job,next_job} itself is NULL, I
don't really see where the NULL deref is. If s_fence is NULL, that means
drm_sched_job_init() wasn't called (unlikely to be detected that late),
or ->free_job()/drm_sched_job_cleanup() was called while the job was
still in the pending list. I don't really see a situation where job
could NULL to be honest.

While I agree that updating the timestamp before the fence has been
flagged as signaled/timestamped is broken (timestamp will be
overwritten when dma_fence_signal(scheduled) is called) I don't see a
situation where it would cause a NULL/invalid pointer deref. So I
suspect there's another race causing jobs to be cleaned up while
they're still in the pending_list.

> 
> Ah, that problem again. No that is actually quite harmless.
> 
> You just need to double check if the DMA_FENCE_FLAG_TIMESTAMP_BIT is 
> already set and if it's not set don't do anything.


Re: [PATCH v11] drm: Add initial ci/ subdirectory

2023-09-12 Thread Daniel Stone
Hi Maxime,
Hopefully less mangled formatting this time: turns out Thunderbird +
plain text is utterly unreadable, so that's one less MUA that is
actually usable to send email to kernel lists without getting shouted
at.

On Mon, 11 Sept 2023 at 15:46, Maxime Ripard  wrote:
> On Mon, Sep 11, 2023 at 03:30:55PM +0200, Michel Dänzer wrote:
> > > There's in 6.6-rc1 around 240 reported flaky tests. None of them have
> > > any context. That new series hads a few dozens too, without any context
> > > either. And there's no mention about that being a plan, or a patch
> > > adding a new policy for all tests going forward.
> >
> > That does sound bad, would need to be raised in review.
> >
> > > Any concern I raised were met with a giant "it worked on Mesa" handwave
> >
> > Lessons learned from years of experience with big real-world CI
> > systems like this are hardly "handwaving".
>
> Your (and others) experience certainly isn't. It is valuable, welcome,
> and very much appreciated.
>
> However, my questions and concerns being ignored time and time again
> about things like what is the process is going to be like, what is going
> to be tested, who is going to be maintaining that test list, how that
> interacts with stable, how we can possibly audit the flaky tests list,
> etc. have felt like they were being handwaived away.

Sorry it ended up coming across like that. It wasn't the intent.

> I'm not saying that because I disagree, I still do on some, but that's
> fine to some extent. However, most of these issues are not so much an
> infrastructure issue, but a community issue. And I don't even expect a
> perfect solution right now, unlike what you seem to think. But I do
> expect some kind of plan instead of just ignoring that problem.
>
> Like, I had to ask the MT8173 question 3 times in order to get an
> answer, and I'm still not sure what is going to be done to address that
> particular issue.
>
> So, I'm sorry, but I certainly feel like it here.

I don't quite see the same picture from your side though. For example,
my reading of what you've said is that flaky tests are utterly
unacceptable, as are partial runs, and we shouldn't pretend otherwise.
With your concrete example (which is really helpful, so thanks), what
happens to the MT8173 hdmi-inject test? Do we skip all MT8173 testing
until it's perfect, or does MT8173 testing always fail because that
test does?

Both have their downsides. Not doing any testing has the obvious
downside, and means that the driver can get worse until it gets
perfect. Always marking the test as failed means that the test results
are useless: if failure is expected, then red is good. I mean, say
you're contributing a patch to fix some documentation or add a helper
to common code which only v3d uses. The test results come back, and
your branch is failing tests on MT8173, specifically the
hdmi-inject@4k test. What then? Either as a senior contributor you
'know' that's the case, or as a casual contributor you get told 'oh
yeah, don't worry about the test results, they always fail'. Both lead
to the same outcome, which is that no-one pays any attention to the
results, and they get worse.

What we do agree on is that yes, those tests should absolutely be
fixed, and not just swept under the rug. Part of this is having
maintainers actually meaningfully own their test results. For example,
I'm looking at the expectation lists for the Intel gen in my laptop,
and I'm seeing a lot of breakage in blending tests, as well as
dual-display fails which include the resolution of my external
display. I'd expect the Intel driver maintainers to look at them, get
them fixed, and gradually prune those xfails/flakes down towards zero.

If the maintainers don't own it though, then it's not going to get
fixed. And we are exactly where we are today: broken plane blending
and 1440p on KBL, broken EDID injection on MT8173, and broken atomic
commits on stoney. Without stronger action from the maintainers (e.g.
throwing i915 out of the tree until it has 100% pass 100% of the
time), adding testing isn't making the situation better or worse in
and of itself. What it _is_ doing though, is giving really clear
documentation of the status of each driver, and backing that up by
verifying it.

Only maintainers can actually fix the drivers (or the tests tbf). But
doing the testing does let us be really clear to everyone what the
actual state is, and that way people can make informed decisions too.
And the only way we're going to drive the test rate down is by the
subsystem maintainers enforcing it.

Does that make sense on where I'm (and I think a lot of others are) coming from?

To answer the other question about 'where are the logs?': some of them
have the failure data in them, others don't. They all should going
forward at least though.

Cheers,
Daniel


Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Danilo Krummrich
On Tue, Sep 12, 2023 at 02:18:18PM +0200, Boris Brezillon wrote:
> On Tue, 12 Sep 2023 12:46:26 +0200
> Danilo Krummrich  wrote:
> 
> > > I'm a bit worried that leaving this single vs multi-threaded wq
> > > decision to drivers is going to cause unnecessary pain, because what
> > > was previously a granted in term of run/cleanup execution order (thanks
> > > to the kthread+static-drm_sched_main-workflow approach) is now subject
> > > to the wq ordering guarantees, which depend on the wq type picked by
> > > the driver.  
> > 
> > Not sure if this ends up to be much different. The only thing I could think 
> > of
> > is that IIRC with the kthread implementation cleanup was always preferred 
> > over
> > run.
> 
> Given the sequence in drm_sched_main(), I'd say that cleanup and run
> operations are naturally interleaved when both are available, but I
> might be wrong.

>From drm_sched_main():

wait_event_interruptible(sched->wake_up_worker,
 (cleanup_job = 
drm_sched_get_cleanup_job(sched)) ||
 (!drm_sched_blocked(sched) &&
  (entity = drm_sched_select_entity(sched))) ||
 kthread_should_stop());

if (cleanup_job)
sched->ops->free_job(cleanup_job);

if (!entity)
continue;

If cleanup_job is not NULL the rest shouldn't be evaluated I guess. Hence entity
would be NULL and we'd loop until there are no more cleanup_jobs if I don't miss
anything here.

> 
> > With a single threaded wq this should be a bit more balanced.
> 
> With a single threaded wq, it's less clear, because each work
> reschedules itself for further processing, but it's likely to be more
> or less interleaved. Anyway, I'm not too worried about cleanup taking
> precedence on run or the other way around, because the limited amount
> of HW slots (size of the ring-buffer) will regulate that.

Yeah, that's what I meant, with to work items rescheduling themselves it starts
to be interleaved. Which I'm not worried about as well.

> 
> > 
> > With a multi-threaded wq it's still the same, but run and cleanup can run
> > concurrently,
> 
> What I'm worried about is that ^. I'm not saying it's fundamentally
> unsafe, but I'm saying drm_sched hasn't been designed with this
> concurrency in mind, and I fear we'll face subtle bugs if we go from
> kthread to multi-threaded-wq+run-and-cleanup-split-in-2-work-items.
> 

Yeah, so what we get with that is that job_run() of job A and job_free() of job
B can run in parallel. Unless drivers do weird things there, I'm not seeing an
issue with that as well at a first glance.

> > which has the nice side effect that free_job() gets out of the
> > fence signaling path. At least as long as the workqueue has max_active > 1.
> 
> Oh, yeah, I don't deny using a multi-threaded workqueue has some
> benefits, just saying it might be trickier than it sounds.
> 
> > Which is one reason why I'm using a multi-threaded wq in Nouveau.
> 
> Note that I'm using a multi-threaded workqueue internally at the moment
> to deal with all sort of interactions with the FW (Mali HW only has a
> limited amount of scheduling slots, and we need to rotate entities
> having jobs to execute so every one gets a chance to run on the GPU),
> but this has been designed this way from the ground up, unlike
> drm_sched_main() operations, which were mostly thought as a fixed
> sequential set of operations. That's not to say it's impossible to get
> right, but I fear we'll face weird/unexpected behavior if we go from
> completely-serialized to multi-threaded-with-pseudo-random-processing
> order.

>From a per job perspective it's still all sequential and besides fence
dependencies, which are still resolved, I don't see where jobs could have cross
dependencies that make this racy. But agree that it's probably worth to think
through it a bit more.

> 
> > 
> > That latter seems a bit subtile, we probably need to document this aspect of
> > under which conditions free_job() is or is not within the fence signaling 
> > path.
> 
> Well, I'm not even sure it can be clearly defined when the driver is
> using the submit_wq for its own work items (which can be done since we
> pass an optional submit_wq when calling drm_sched_init()). Sure, having
> max_active >= 2 should be enough to guarantee that the free_job work
> won't block the run_job one when these are the 2 only works being
> queued, but what if you have many other work items being queued by the
> driver to this wq, and some of those try to acquire resv locks? Could
> this prevent execution of the run_job() callback, thus preventing
> signaling of fences? I'm genuinely asking, don't know enough about the
> cmwq implementation to tell what's happening when work items are
> blocked (might be that the worker pool is extended to unblock the
> situation).

Yes, I think so. If max_active would be 2 and you have two jobs running on this

Re: [PATCH v3 1/4] drm/ttm/tests: Add tests for ttm_resource and ttm_sys_man

2023-09-12 Thread Christian König

Am 12.09.23 um 13:49 schrieb Karolina Stolarek:

Test initialization of ttm_resource using different memory domains.
Add tests for a system memory manager and functions that can be
tested without a fully-featured resource manager. Update
ttm_bo_kunit_init() to initialize BO's kref and reservation object.
Export ttm_resource_alloc symbol for test purposes only.

Signed-off-by: Karolina Stolarek 
---
  drivers/gpu/drm/ttm/tests/Makefile|   1 +
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  23 ++
  drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h |   4 +
  drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 335 ++
  drivers/gpu/drm/ttm/ttm_resource.c|   3 +
  5 files changed, 366 insertions(+)
  create mode 100644 drivers/gpu/drm/ttm/tests/ttm_resource_test.c

diff --git a/drivers/gpu/drm/ttm/tests/Makefile 
b/drivers/gpu/drm/ttm/tests/Makefile
index ec87c4fc1ad5..c92fe2052ef6 100644
--- a/drivers/gpu/drm/ttm/tests/Makefile
+++ b/drivers/gpu/drm/ttm/tests/Makefile
@@ -3,4 +3,5 @@
  obj-$(CONFIG_DRM_TTM_KUNIT_TEST) += \
  ttm_device_test.o \
  ttm_pool_test.o \
+ttm_resource_test.o \
  ttm_kunit_helpers.o
diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c 
b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
index 81661d8827aa..eccc59b981f8 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
@@ -38,10 +38,33 @@ struct ttm_buffer_object *ttm_bo_kunit_init(struct kunit 
*test,
bo->base = gem_obj;
bo->bdev = devs->ttm_dev;
  
+	kref_init(&bo->kref);

+
+   dma_resv_init(&bo->base._resv);
+   bo->base.resv = &bo->base._resv;
+


I'm really wondering if we shouldn't now initialize the GEM object properly?

That would also initialize the reservation object if I remember correctly.

The solution with EXPORT_SYMBOL_FOR_TESTS_ONLY looks really nice I think 
and apart from that I can't see anything obviously wrong either, but I 
only skimmed over the code.


Regards,
Christian.


return bo;
  }
  EXPORT_SYMBOL_GPL(ttm_bo_kunit_init);
  
+struct ttm_place *ttm_place_kunit_init(struct kunit *test,

+  uint32_t mem_type, uint32_t flags,
+  size_t size)
+{
+   struct ttm_place *place;
+
+   place = kunit_kzalloc(test, sizeof(*place), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, place);
+
+   place->mem_type = mem_type;
+   place->flags = flags;
+   place->fpfn = size >> PAGE_SHIFT;
+   place->lpfn = place->fpfn + (size >> PAGE_SHIFT);
+
+   return place;
+}
+EXPORT_SYMBOL_GPL(ttm_place_kunit_init);
+
  struct ttm_test_devices *ttm_test_devices_basic(struct kunit *test)
  {
struct ttm_test_devices *devs;
diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h 
b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
index e261e3660d0b..f38140f93c05 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -28,6 +29,9 @@ int ttm_device_kunit_init(struct ttm_test_devices *priv,
  struct ttm_buffer_object *ttm_bo_kunit_init(struct kunit *test,
struct ttm_test_devices *devs,
size_t size);
+struct ttm_place *ttm_place_kunit_init(struct kunit *test,
+  uint32_t mem_type, uint32_t flags,
+  size_t size);
  
  struct ttm_test_devices *ttm_test_devices_basic(struct kunit *test);

  struct ttm_test_devices *ttm_test_devices_all(struct kunit *test);
diff --git a/drivers/gpu/drm/ttm/tests/ttm_resource_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
new file mode 100644
index ..851cdc43dc37
--- /dev/null
+++ b/drivers/gpu/drm/ttm/tests/ttm_resource_test.c
@@ -0,0 +1,335 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include 
+
+#include "ttm_kunit_helpers.h"
+
+#define RES_SIZE   SZ_4K
+#define TTM_PRIV_DUMMY_REG (TTM_NUM_MEM_TYPES - 1)
+
+struct ttm_resource_test_case {
+   const char *description;
+   uint32_t mem_type;
+   uint32_t flags;
+};
+
+struct ttm_resource_test_priv {
+   struct ttm_test_devices *devs;
+   struct ttm_buffer_object *bo;
+   struct ttm_place *place;
+};
+
+static const struct ttm_resource_manager_func ttm_resource_manager_mock_funcs 
= { };
+
+static int ttm_resource_test_init(struct kunit *test)
+{
+   struct ttm_resource_test_priv *priv;
+
+   priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, priv);
+
+   priv->devs = ttm_test_devices_all(test);
+   KUNIT_ASSERT_NOT_NULL(test, priv->devs);
+
+   test->priv = priv;
+
+   return 0;
+}
+
+static void ttm_resource_test_fini(

回复: 回复: [PATCH] drm/komeda: add NV12 format to support writeback layer type

2023-09-12 Thread Liu Lucas/刘保柱
Hi Liviu,

Thank you so much for reviewing this patch!  I expect this patch to be 
merged.

Best Regards,
baozhu.liu

-邮件原件-
发件人: liviu.du...@arm.com  
发送时间: 2023年9月11日 22:46
收件人: Liu Lucas/刘保柱 
抄送: airl...@gmail.com; dan...@ffwll.ch; dri-devel@lists.freedesktop.org; 
linux-ker...@vger.kernel.org
主题: Re: 回复: [PATCH] drm/komeda: add NV12 format to support writeback layer type

Hi Liu,

Sorry about the delay, I was on holiday until 28th and while cleaning up the 
backlog I've accidentally marked the email as read and did not reply.


On Fri, Sep 08, 2023 at 08:11:44AM +, Liu Lucas/刘保柱 wrote:
> Hi  all,
> 
>   Do you have any suggestions for the patch I submitted? Please also let 
> me know, thank you!
> 
> Best Regards,
> baozhu.liu
> -邮件原件-
> 发件人: baozhu.liu 
> 发送时间: 2023年8月29日 17:30
> 收件人: liviu.du...@arm.com; airl...@gmail.com; dan...@ffwll.ch
> 抄送: dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; Liu 
> Lucas/刘保柱 
> 主题: [PATCH] drm/komeda: add NV12 format to support writeback layer 
> type
> 
> When testing the d71 writeback layer function, the output format is set to 
> NV12, and the following error message is displayed:
> 
> [drm:komeda_fb_is_layer_supported] Layer TYPE: 4 doesn't support fb FMT: NV12 
> little-endian (0x3231564e) with modifier: 0x0..
> 
> Check the d71 data manual, writeback layer output formats includes NV12 
> format.
> 
> Signed-off-by: baozhu.liu 

Acked-by: Liviu Dudau 

I will push the patch this week into drm-misc-next.

Best regards,
Liviu

> ---
>  drivers/gpu/drm/arm/display/komeda/d71/d71_dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/arm/display/komeda/d71/d71_dev.c 
> b/drivers/gpu/drm/arm/display/komeda/d71/d71_dev.c
> index 6c56f5662bc7..80973975bfdb 100644
> --- a/drivers/gpu/drm/arm/display/komeda/d71/d71_dev.c
> +++ b/drivers/gpu/drm/arm/display/komeda/d71/d71_dev.c
> @@ -521,7 +521,7 @@ static struct komeda_format_caps d71_format_caps_table[] 
> = {
>   {__HW_ID(5, 1), DRM_FORMAT_YUYV,RICH,   Rot_ALL_H_V,
> LYT_NM, AFB_TH}, /* afbc */
>   {__HW_ID(5, 2), DRM_FORMAT_YUYV,RICH,   Flip_H_V,   
> 0, 0},
>   {__HW_ID(5, 3), DRM_FORMAT_UYVY,RICH,   Flip_H_V,   
> 0, 0},
> - {__HW_ID(5, 6), DRM_FORMAT_NV12,RICH,   Flip_H_V,   
> 0, 0},
> + {__HW_ID(5, 6), DRM_FORMAT_NV12,RICH_WB,Flip_H_V,   
> 0, 0},
>   {__HW_ID(5, 6), DRM_FORMAT_YUV420_8BIT, RICH,   Rot_ALL_H_V,
> LYT_NM, AFB_TH}, /* afbc */
>   {__HW_ID(5, 7), DRM_FORMAT_YUV420,  RICH,   Flip_H_V,   
> 0, 0},
>   /* YUV 10bit*/
> --
> 2.17.1
> 

--

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯


Re: [PATCH v3 09/10] drm/msm/a6xx: Add A740 support

2023-09-12 Thread Konrad Dybcio
On 23.08.2023 14:56, Konrad Dybcio wrote:
> A740 builds upon the A730 IP, shuffling some values and registers
> around. More differences will appear when things like BCL are
> implemented.
> 
> adreno_is_a740_family is added in preparation for more A7xx GPUs,
> the logic checks will be valid resulting in smaller diffs.
> 
> Tested-by: Neil Armstrong  # on SM8550-QRD
> Tested-by: Dmitry Baryshkov  # sm8450
> Signed-off-by: Konrad Dybcio 
> ---
[...]

>   .gmem = SZ_2M,
>   .inactive_period = DRM_MSM_INACTIVE_PERIOD,
> + .quirks = ADRENO_QUIRK_HAS_CACHED_COHERENT |
> +   ADRENO_QUIRK_HAS_HW_APRIV,
That's a funny conflict resolution (should have been in the previous
commit..). If there are no other comments, could you fix this up while
applying, Rob?

Konrad


Re: [PATCH v2 4/9] drm/sched: Split free_job into own work item

2023-09-12 Thread Boris Brezillon
On Tue, 12 Sep 2023 12:46:26 +0200
Danilo Krummrich  wrote:

> > I'm a bit worried that leaving this single vs multi-threaded wq
> > decision to drivers is going to cause unnecessary pain, because what
> > was previously a granted in term of run/cleanup execution order (thanks
> > to the kthread+static-drm_sched_main-workflow approach) is now subject
> > to the wq ordering guarantees, which depend on the wq type picked by
> > the driver.  
> 
> Not sure if this ends up to be much different. The only thing I could think of
> is that IIRC with the kthread implementation cleanup was always preferred over
> run.

Given the sequence in drm_sched_main(), I'd say that cleanup and run
operations are naturally interleaved when both are available, but I
might be wrong.

> With a single threaded wq this should be a bit more balanced.

With a single threaded wq, it's less clear, because each work
reschedules itself for further processing, but it's likely to be more
or less interleaved. Anyway, I'm not too worried about cleanup taking
precedence on run or the other way around, because the limited amount
of HW slots (size of the ring-buffer) will regulate that.

> 
> With a multi-threaded wq it's still the same, but run and cleanup can run
> concurrently,

What I'm worried about is that ^. I'm not saying it's fundamentally
unsafe, but I'm saying drm_sched hasn't been designed with this
concurrency in mind, and I fear we'll face subtle bugs if we go from
kthread to multi-threaded-wq+run-and-cleanup-split-in-2-work-items.

> which has the nice side effect that free_job() gets out of the
> fence signaling path. At least as long as the workqueue has max_active > 1.

Oh, yeah, I don't deny using a multi-threaded workqueue has some
benefits, just saying it might be trickier than it sounds.

> Which is one reason why I'm using a multi-threaded wq in Nouveau.

Note that I'm using a multi-threaded workqueue internally at the moment
to deal with all sort of interactions with the FW (Mali HW only has a
limited amount of scheduling slots, and we need to rotate entities
having jobs to execute so every one gets a chance to run on the GPU),
but this has been designed this way from the ground up, unlike
drm_sched_main() operations, which were mostly thought as a fixed
sequential set of operations. That's not to say it's impossible to get
right, but I fear we'll face weird/unexpected behavior if we go from
completely-serialized to multi-threaded-with-pseudo-random-processing
order.

> 
> That latter seems a bit subtile, we probably need to document this aspect of
> under which conditions free_job() is or is not within the fence signaling 
> path.

Well, I'm not even sure it can be clearly defined when the driver is
using the submit_wq for its own work items (which can be done since we
pass an optional submit_wq when calling drm_sched_init()). Sure, having
max_active >= 2 should be enough to guarantee that the free_job work
won't block the run_job one when these are the 2 only works being
queued, but what if you have many other work items being queued by the
driver to this wq, and some of those try to acquire resv locks? Could
this prevent execution of the run_job() callback, thus preventing
signaling of fences? I'm genuinely asking, don't know enough about the
cmwq implementation to tell what's happening when work items are
blocked (might be that the worker pool is extended to unblock the
situation).

Anyway, documenting when free_job() is in the dma signalling path should
be doable (single-threaded wq), but at this point, are we not better
off considering anything called from the submit_wq as being part of the
dma signalling path, so we can accommodate with both cases. And if
there is cleanup processing that require taking dma_resv locks, I'd be
tempted to queue that to a driver-specific wq (which is what I'm doing
right now), just to be safe.


Re: EXT: Re: [RFC] drm/bridge: megachips-stdpxxxx-ge-b850v3-fw: switch to drm_do_get_edid()

2023-09-12 Thread Jani Nikula
On Fri, 08 Sep 2023, Ian Ray  wrote:
> On Fri, Sep 01, 2023 at 05:52:02PM +0300, Jani Nikula wrote:
>> 
>> On Fri, 01 Sep 2023, Jani Nikula  wrote:
>> > The driver was originally added in commit fcfa0ddc18ed ("drm/bridge:
>> > Drivers for megachips-stdp-ge-b850v3-fw (LVDS-DP++)"). I tried to
>> > look up the discussion, but didn't find anyone questioning the EDID
>> > reading part.
>> >
>> > Why does it not use drm_get_edid() or drm_do_get_edid()?
>> >
>> > I don't know where client->addr comes from, so I guess it could be
>> > different from DDC_ADDR, rendering drm_get_edid() unusable.
>> >
>> > There's also the comment:
>> >
>> >/* Yes, read the entire buffer, and do not skip the first
>> > * EDID_LENGTH bytes.
>> > */
>> >
>> > But again, there's not a word on *why*.
>> >
>> > Maybe we could just use drm_do_get_edid()? I'd like drivers to migrate
>> > away from their own EDID parsing and validity checks, including stop
>> > using drm_edid_block_valid(). (And long term switch to drm_edid_read(),
>> > struct drm_edid, and friends, but this is the first step.)
>> >
>> > Cc: Andrzej Hajda 
>> > Cc: Ian Ray 
>> > Cc: Jernej Skrabec 
>> > Cc: Jonas Karlman 
>> > Cc: Laurent Pinchart 
>> > Cc: Martin Donnelly 
>> > Cc: Martyn Welch 
>> > Cc: Neil Armstrong 
>> > Cc: Peter Senna Tschudin 
>> > Cc: Robert Foss 
>> > Cc: Yuan Can 
>> > Cc: Zheyu Ma 
>> > Signed-off-by: Jani Nikula 
>> >
>> > ---
>> >
>> > I haven't even tried to compile this, and I have no way to test
>> > this. Apologies for the long Cc list; I'm hoping someone could explain
>> > the existing code, and perhaps give this approach a spin.
>> > ---
>> >  .../bridge/megachips-stdp-ge-b850v3-fw.c  | 57 +++
>> >  1 file changed, 9 insertions(+), 48 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/bridge/megachips-stdp-ge-b850v3-fw.c 
>> > b/drivers/gpu/drm/bridge/megachips-stdp-ge-b850v3-fw.c
>> > index 460db3c8a08c..0d9eacf3d9b7 100644
>> > --- a/drivers/gpu/drm/bridge/megachips-stdp-ge-b850v3-fw.c
>> > +++ b/drivers/gpu/drm/bridge/megachips-stdp-ge-b850v3-fw.c
>> > @@ -65,12 +65,11 @@ struct ge_b850v3_lvds {
>> >  
>> >  static struct ge_b850v3_lvds *ge_b850v3_lvds_ptr;
>> >  
>> > -static u8 *stdp2690_get_edid(struct i2c_client *client)
>> > +static int stdp2690_read_block(void *context, u8 *buf, unsigned int 
>> > block, size_t len)
>> >  {
>> > +  struct i2c_client *client = context;
>> >struct i2c_adapter *adapter = client->adapter;
>> > -  unsigned char start = 0x00;
>> > -  unsigned int total_size;
>> > -  u8 *block = kmalloc(EDID_LENGTH, GFP_KERNEL);
>> > +  unsigned char start = block * EDID_LENGTH;
>> >  
>> >struct i2c_msg msgs[] = {
>> >{
>> > @@ -81,53 +80,15 @@ static u8 *stdp2690_get_edid(struct i2c_client *client)
>> >}, {
>> >.addr   = client->addr,
>> >.flags  = I2C_M_RD,
>> > -  .len= EDID_LENGTH,
>> > -  .buf= block,
>> > +  .len= len,
>> > +  .buf= buf,
>> >}
>> >};
>> >  
>> > -  if (!block)
>> > -  return NULL;
>> > +  if (i2c_transfer(adapter, msgs, 2) != 2)
>> > +  return -1;
>> >  
>> > -  if (i2c_transfer(adapter, msgs, 2) != 2) {
>> > -  DRM_ERROR("Unable to read EDID.\n");
>> > -  goto err;
>> > -  }
>> > -
>> > -  if (!drm_edid_block_valid(block, 0, false, NULL)) {
>> > -  DRM_ERROR("Invalid EDID data\n");
>> > -  goto err;
>> > -  }
>> > -
>> > -  total_size = (block[EDID_EXT_BLOCK_CNT] + 1) * EDID_LENGTH;
>> > -  if (total_size > EDID_LENGTH) {
>> > -  kfree(block);
>> > -  block = kmalloc(total_size, GFP_KERNEL);
>> > -  if (!block)
>> > -  return NULL;
>> > -
>> > -  /* Yes, read the entire buffer, and do not skip the first
>> > -   * EDID_LENGTH bytes.
>> > -   */
>> > -  start = 0x00;
>> > -  msgs[1].len = total_size;
>> > -  msgs[1].buf = block;
>> > -
>> > -  if (i2c_transfer(adapter, msgs, 2) != 2) {
>> > -  DRM_ERROR("Unable to read EDID extension blocks.\n");
>> > -  goto err;
>> > -  }
>> > -  if (!drm_edid_block_valid(block, 1, false, NULL)) {
>> > -  DRM_ERROR("Invalid EDID data\n");
>> > -  goto err;
>> > -  }
>> > -  }
>> > -
>> > -  return block;
>> > -
>> > -err:
>> > -  kfree(block);
>> > -  return NULL;
>> > +  return 0;
>> >  }
>> >  
>> >  static struct edid *ge_b850v3_lvds_get_edid(struct drm_bridge *bridge,
>> > @@ -137,7 +98,7 @@ static struct edid *ge_b850v3_lvds_get_edid(struct 
>> > drm_bridge *bridge,
>> >  
>> >client = ge_b850v3_lvds_ptr->stdp2690_i2c;
>> >  
>> > -  return (struct edid *)stdp2690_get_edid(client);
>> > +  return drm_do_get_edid(connector, stdp2690_read_block, client, NULL);
>> 
>> The last NULL param should be dropped, as 

Re: [PATCH] drm/msm/dp: skip validity check for DP CTS EDID checksum

2023-09-12 Thread Jani Nikula
On Thu, 07 Sep 2023, Stephen Boyd  wrote:
> Quoting Jani Nikula (2023-09-01 07:20:34)
>> The DP CTS test for EDID last block checksum expects the checksum for
>> the last block, invalid or not. Skip the validity check.
>>
>> For the most part (*), the EDIDs returned by drm_get_edid() will be
>> valid anyway, and there's the CTS workaround to get the checksum for
>> completely invalid EDIDs. See commit 7948fe12d47a ("drm/msm/dp: return
>> correct edid checksum after corrupted edid checksum read").
>>
>> This lets us remove one user of drm_edid_block_valid() with hopes the
>> function can be removed altogether in the future.
>>
>> (*) drm_get_edid() ignores checksum errors on CTA extensions.
>>
>> Cc: Abhinav Kumar 
>> Cc: Dmitry Baryshkov 
>> Cc: Kuogee Hsieh 
>> Cc: Marijn Suijten 
>> Cc: Rob Clark 
>> Cc: Sean Paul 
>> Cc: Stephen Boyd 
>> Cc: linux-arm-...@vger.kernel.org
>> Cc: freedr...@lists.freedesktop.org
>> Signed-off-by: Jani Nikula 
>> ---
>
> Reviewed-by: Stephen Boyd 

Thanks; is that enough to merge? I can't claim I would have been able to
test this.

>
>>
>> diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c 
>> b/drivers/gpu/drm/msm/dp/dp_panel.c
>> index 42d52510ffd4..86a8e06c7a60 100644
>> --- a/drivers/gpu/drm/msm/dp/dp_panel.c
>> +++ b/drivers/gpu/drm/msm/dp/dp_panel.c
>> @@ -289,26 +289,9 @@ int dp_panel_get_modes(struct dp_panel *dp_panel,
>>
>>  static u8 dp_panel_get_edid_checksum(struct edid *edid)
>
> It would be nice to make 'edid' const here in another patch.

Sure.

BR,
Jani.


-- 
Jani Nikula, Intel


[PATCH v3 4/4] drm/ttm/tests: Fix argument in ttm_tt_kunit_init()

2023-09-12 Thread Karolina Stolarek
Remove a leftover definition of page order and pass an empty flag value
in ttm_pool_pre_populated().

Signed-off-by: Karolina Stolarek 
---
 drivers/gpu/drm/ttm/tests/ttm_pool_test.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
index 2d9cae8cd984..b97f7b6daf5b 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_pool_test.c
@@ -78,10 +78,9 @@ static struct ttm_pool *ttm_pool_pre_populated(struct kunit 
*test,
struct ttm_test_devices *devs = priv->devs;
struct ttm_pool *pool;
struct ttm_tt *tt;
-   unsigned long order = __fls(size / PAGE_SIZE);
int err;
 
-   tt = ttm_tt_kunit_init(test, order, caching, size);
+   tt = ttm_tt_kunit_init(test, 0, caching, size);
KUNIT_ASSERT_NOT_NULL(test, tt);
 
pool = kunit_kzalloc(test, sizeof(*pool), GFP_KERNEL);
-- 
2.25.1



[PATCH v3 3/4] drm/ttm/tests: Add tests for ttm_bo functions

2023-09-12 Thread Karolina Stolarek
Test reservation and release of TTM buffer objects. Add tests to check
pin and unpin operations.

Signed-off-by: Karolina Stolarek 
---
 drivers/gpu/drm/ttm/tests/Makefile|   1 +
 drivers/gpu/drm/ttm/tests/ttm_bo_test.c   | 620 ++
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |   5 +
 3 files changed, 626 insertions(+)
 create mode 100644 drivers/gpu/drm/ttm/tests/ttm_bo_test.c

diff --git a/drivers/gpu/drm/ttm/tests/Makefile 
b/drivers/gpu/drm/ttm/tests/Makefile
index f570530bbb60..468535f7eed2 100644
--- a/drivers/gpu/drm/ttm/tests/Makefile
+++ b/drivers/gpu/drm/ttm/tests/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_TTM_KUNIT_TEST) += \
 ttm_pool_test.o \
 ttm_resource_test.o \
 ttm_tt_test.o \
+ttm_bo_test.o \
 ttm_kunit_helpers.o
diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
new file mode 100644
index ..48d71f1fe65d
--- /dev/null
+++ b/drivers/gpu/drm/ttm/tests/ttm_bo_test.c
@@ -0,0 +1,620 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "ttm_kunit_helpers.h"
+
+#define BO_SIZESZ_8K
+
+struct ttm_bo_test_case {
+   const char *description;
+   bool interruptible;
+   bool no_wait;
+};
+
+static const struct ttm_bo_test_case ttm_bo_reserved_cases[] = {
+   {
+   .description = "Cannot be interrupted and sleeps",
+   .interruptible = false,
+   .no_wait = false,
+   },
+   {
+   .description = "Cannot be interrupted, locks straight away",
+   .interruptible = false,
+   .no_wait = true,
+   },
+   {
+   .description = "Can be interrupted, sleeps",
+   .interruptible = true,
+   .no_wait = false,
+   },
+};
+
+static void ttm_bo_init_case_desc(const struct ttm_bo_test_case *t,
+ char *desc)
+{
+   strscpy(desc, t->description, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(ttm_bo_reserve, ttm_bo_reserved_cases, 
ttm_bo_init_case_desc);
+
+static void ttm_bo_reserve_optimistic_no_ticket(struct kunit *test)
+{
+   const struct ttm_bo_test_case *params = test->param_value;
+   struct ttm_buffer_object *bo;
+   int err;
+
+   bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   err = ttm_bo_reserve(bo, params->interruptible, params->no_wait, NULL);
+   KUNIT_ASSERT_EQ(test, err, 0);
+
+   dma_resv_unlock(bo->base.resv);
+}
+
+static void ttm_bo_reserve_locked_no_sleep(struct kunit *test)
+{
+   struct ttm_buffer_object *bo;
+   bool interruptible = false;
+   bool no_wait = true;
+   int err;
+
+   bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   /* Let's lock it beforehand */
+   dma_resv_lock(bo->base.resv, NULL);
+
+   err = ttm_bo_reserve(bo, interruptible, no_wait, NULL);
+   dma_resv_unlock(bo->base.resv);
+
+   KUNIT_ASSERT_EQ(test, err, -EBUSY);
+}
+
+static void ttm_bo_reserve_no_wait_ticket(struct kunit *test)
+{
+   struct ttm_buffer_object *bo;
+   struct ww_acquire_ctx ctx;
+   bool interruptible = false;
+   bool no_wait = true;
+   int err;
+
+   ww_acquire_init(&ctx, &reservation_ww_class);
+
+   bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   err = ttm_bo_reserve(bo, interruptible, no_wait, &ctx);
+   KUNIT_ASSERT_EQ(test, err, -EBUSY);
+
+   ww_acquire_fini(&ctx);
+}
+
+static void ttm_bo_reserve_double_resv(struct kunit *test)
+{
+   struct ttm_buffer_object *bo;
+   struct ww_acquire_ctx ctx;
+   bool interruptible = false;
+   bool no_wait = false;
+   int err;
+
+   ww_acquire_init(&ctx, &reservation_ww_class);
+
+   bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   err = ttm_bo_reserve(bo, interruptible, no_wait, &ctx);
+   KUNIT_ASSERT_EQ(test, err, 0);
+
+   err = ttm_bo_reserve(bo, interruptible, no_wait, &ctx);
+
+   ww_acquire_fini(&ctx);
+   dma_resv_unlock(bo->base.resv);
+
+   KUNIT_ASSERT_EQ(test, err, -EALREADY);
+}
+
+/*
+ * A test case heavily inspired by ww_test_edeadlk_normal(). Checks
+ * if -EDEADLK is properly propagated by ttm_bo_reserve()
+ */
+static void ttm_bo_reserve_deadlock(struct kunit *test)
+{
+   struct ttm_buffer_object *bo1, *bo2;
+   struct ww_acquire_ctx ctx1, ctx2;
+   bool interruptible = false;
+   bool no_wait = false;
+   int err;
+
+   bo1 = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+   bo2 = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   mutex_lock(&bo2->base.resv->lock.base);
+   bo2->base.resv->lock.ctx = &ctx2;
+
+   ww_acquire_init(&ctx1, &reservation_ww_class);
+   ctx2 = ctx1;
+   ctx2.stamp--; /* Make the cont

[PATCH v3 2/4] drm/ttm/tests: Add tests for ttm_tt

2023-09-12 Thread Karolina Stolarek
Test initialization, creation and destruction of ttm_tt instances.
Export ttm_tt_destroy and ttm_tt_create symbols for test purposes.

Signed-off-by: Karolina Stolarek 
---
 drivers/gpu/drm/ttm/tests/Makefile|   1 +
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |  20 ++
 drivers/gpu/drm/ttm/tests/ttm_tt_test.c   | 277 ++
 drivers/gpu/drm/ttm/ttm_tt.c  |   3 +
 4 files changed, 301 insertions(+)
 create mode 100644 drivers/gpu/drm/ttm/tests/ttm_tt_test.c

diff --git a/drivers/gpu/drm/ttm/tests/Makefile 
b/drivers/gpu/drm/ttm/tests/Makefile
index c92fe2052ef6..f570530bbb60 100644
--- a/drivers/gpu/drm/ttm/tests/Makefile
+++ b/drivers/gpu/drm/ttm/tests/Makefile
@@ -4,4 +4,5 @@ obj-$(CONFIG_DRM_TTM_KUNIT_TEST) += \
 ttm_device_test.o \
 ttm_pool_test.o \
 ttm_resource_test.o \
+ttm_tt_test.o \
 ttm_kunit_helpers.o
diff --git a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c 
b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
index eccc59b981f8..112381153bbf 100644
--- a/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
+++ b/drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c
@@ -2,9 +2,29 @@
 /*
  * Copyright © 2023 Intel Corporation
  */
+#include 
+
 #include "ttm_kunit_helpers.h"
 
+static struct ttm_tt *ttm_tt_simple_create(struct ttm_buffer_object *bo,
+  uint32_t page_flags)
+{
+   struct ttm_tt *tt;
+
+   tt = kzalloc(sizeof(*tt), GFP_KERNEL);
+   ttm_tt_init(tt, bo, 0, ttm_cached, 0);
+
+   return tt;
+}
+
+static void ttm_tt_simple_destroy(struct ttm_device *bdev, struct ttm_tt *ttm)
+{
+   kfree(ttm);
+}
+
 struct ttm_device_funcs ttm_dev_funcs = {
+   .ttm_tt_create = ttm_tt_simple_create,
+   .ttm_tt_destroy = ttm_tt_simple_destroy,
 };
 EXPORT_SYMBOL_GPL(ttm_dev_funcs);
 
diff --git a/drivers/gpu/drm/ttm/tests/ttm_tt_test.c 
b/drivers/gpu/drm/ttm/tests/ttm_tt_test.c
new file mode 100644
index ..1300ca93e523
--- /dev/null
+++ b/drivers/gpu/drm/ttm/tests/ttm_tt_test.c
@@ -0,0 +1,277 @@
+// SPDX-License-Identifier: GPL-2.0 AND MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+#include 
+#include 
+
+#include 
+
+#include "ttm_kunit_helpers.h"
+
+#define BO_SIZESZ_4K
+
+struct ttm_tt_test_case {
+   const char *description;
+   uint32_t size;
+   uint32_t extra_pages_num;
+};
+
+static int ttm_tt_test_init(struct kunit *test)
+{
+   struct ttm_test_devices *priv;
+
+   priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, priv);
+
+   priv = ttm_test_devices_all(test);
+   test->priv = priv;
+
+   return 0;
+}
+
+static const struct ttm_tt_test_case ttm_tt_init_basic_cases[] = {
+   {
+   .description = "Page-aligned size",
+   .size = SZ_4K,
+   },
+   {
+   .description = "Misaligned size",
+   .size = SZ_8K + 1,
+   },
+   {
+   .description = "Extra pages requested",
+   .size = SZ_4K,
+   .extra_pages_num = 1,
+   },
+};
+
+static void ttm_tt_init_case_desc(const struct ttm_tt_test_case *t,
+ char *desc)
+{
+   strscpy(desc, t->description, KUNIT_PARAM_DESC_SIZE);
+}
+
+KUNIT_ARRAY_PARAM(ttm_tt_init_basic, ttm_tt_init_basic_cases,
+ ttm_tt_init_case_desc);
+
+static void ttm_tt_init_basic(struct kunit *test)
+{
+   const struct ttm_tt_test_case *params = test->param_value;
+   struct ttm_buffer_object *bo;
+   struct ttm_tt *tt;
+   uint32_t page_flags = TTM_TT_FLAG_ZERO_ALLOC;
+   enum ttm_caching caching = ttm_cached;
+   uint32_t extra_pages = params->extra_pages_num;
+   int num_pages = PAGE_ALIGN(params->size) / PAGE_SIZE;
+   int err;
+
+   tt = kunit_kzalloc(test, sizeof(*tt), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, tt);
+
+   bo = ttm_bo_kunit_init(test, test->priv, params->size);
+
+   err = ttm_tt_init(tt, bo, page_flags, caching, extra_pages);
+   KUNIT_ASSERT_EQ(test, err, 0);
+
+   KUNIT_ASSERT_EQ(test, tt->num_pages, num_pages + extra_pages);
+
+   KUNIT_ASSERT_EQ(test, tt->page_flags, page_flags);
+   KUNIT_ASSERT_EQ(test, tt->caching, caching);
+
+   KUNIT_ASSERT_NULL(test, tt->dma_address);
+   KUNIT_ASSERT_NULL(test, tt->swap_storage);
+}
+
+static void ttm_tt_fini_basic(struct kunit *test)
+{
+   struct ttm_buffer_object *bo;
+   struct ttm_tt *tt;
+   enum ttm_caching caching = ttm_cached;
+   int err;
+
+   tt = kunit_kzalloc(test, sizeof(*tt), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, tt);
+
+   bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE);
+
+   err = ttm_tt_init(tt, bo, 0, caching, 0);
+   KUNIT_ASSERT_EQ(test, err, 0);
+   KUNIT_ASSERT_NOT_NULL(test, tt->pages);
+
+   ttm_tt_fini(tt);
+   KUNIT_ASSERT_NULL(test, tt->pages);
+}
+
+static void ttm_

  1   2   3   >