On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
> Display controller (DC) performs isochronous memory transfers, and thus,
> has a requirement for a minimum memory bandwidth that shall be fulfilled,
> otherwise framebuffer data can't be fetched fast enough and this results
> in a DC's data-FIFO underflow that follows by a visual corruption.
[...]
> +static unsigned long
> +tegra_plane_overlap_mask(struct drm_crtc_state *state,
> +                      const struct drm_plane_state *plane_state)
> +{
> +     const struct drm_plane_state *other_state;
> +     const struct tegra_plane *tegra;
> +     unsigned long overlap_mask = 0;
> +     struct drm_plane *plane;
> +     struct drm_rect rect;
> +
> +     if (!plane_state->visible || !plane_state->fb)
> +             return 0;
> +
> +     drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
[...]
> +     /*
> +      * Data-prefetch FIFO will easily help to overcome temporal memory
> +      * pressure if other plane overlaps with the cursor plane.
> +      */
> +     if (tegra_plane_is_cursor(plane_state) && overlap_mask)
> +             return 0;
> +
> +     return overlap_mask;
> +}

Since for cursor plane this always returns 0, you could test
tegra_plane_is_cursor() at the start of the function.

> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,
> +                                              struct drm_atomic_state *state)
[...]
> +     /*
> +      * For overlapping planes pixel's data is fetched for each plane at
> +      * the same time, hence bandwidths are accumulated in this case.
> +      * This needs to be taken into account for calculating total bandwidth
> +      * consumed by all planes.
> +      *
> +      * Here we get the overlapping state of each plane, which is a
> +      * bitmask of plane indices telling with what planes there is an
> +      * overlap. Note that bitmask[plane] includes BIT(plane) in order
> +      * to make further code nicer and simpler.
> +      */
> +     drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, 
> new_state) {
> +             tegra_state = to_const_tegra_plane_state(plane_state);
> +             tegra = to_tegra_plane(plane);
> +
> +             if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))
> +                     return -EINVAL;
> +
> +             plane_peak_bw[tegra->index] = 
> tegra_state->peak_memory_bandwidth;
> +             mask = tegra_plane_overlap_mask(new_state, plane_state);
> +             overlap_mask[tegra->index] = mask;
> +
> +             if (hweight_long(mask) != 3)
> +                     all_planes_overlap_simultaneously = false;
> +     }
> +
> +     old_state = drm_atomic_get_old_crtc_state(state, crtc);
> +     old_dc_state = to_const_dc_state(old_state);
> +
> +     /*
> +      * Then we calculate maximum bandwidth of each plane state.
> +      * The bandwidth includes the plane BW + BW of the "simultaneously"
> +      * overlapping planes, where "simultaneously" means areas where DC
> +      * fetches from the planes simultaneously during of scan-out process.
> +      *
> +      * For example, if plane A overlaps with planes B and C, but B and C
> +      * don't overlap, then the peak bandwidth will be either in area where
> +      * A-and-B or A-and-C planes overlap.
> +      *
> +      * The plane_peak_bw[] contains peak memory bandwidth values of
> +      * each plane, this information is needed by interconnect provider
> +      * in order to set up latency allowness based on the peak BW, see
> +      * tegra_crtc_update_memory_bandwidth().
> +      */
> +     for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
> +             overlap_bw = 0;
> +
> +             for_each_set_bit(k, &overlap_mask[i], 3) {
> +                     if (k == i)
> +                             continue;
> +
> +                     if (all_planes_overlap_simultaneously)
> +                             overlap_bw += plane_peak_bw[k];
> +                     else
> +                             overlap_bw = max(overlap_bw, plane_peak_bw[k]);
> +             }
> +
> +             new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
> +
> +             /*
> +              * If plane's peak bandwidth changed (for example plane isn't
> +              * overlapped anymore) and plane isn't in the atomic state,
> +              * then add plane to the state in order to have the bandwidth
> +              * updated.
> +              */
> +             if (old_dc_state->plane_peak_bw[i] !=
> +                 new_dc_state->plane_peak_bw[i]) {
> +                     plane = tegra_crtc_get_plane_by_index(crtc, i);
> +                     if (!plane)
> +                             continue;
> +
> +                     plane_state = drm_atomic_get_plane_state(state, plane);
> +                     if (IS_ERR(plane_state))
> +                             return PTR_ERR(plane_state);
> +             }
> +     }
[...]

Does it matter to which channel (plane) the peak bw is attached? Would
it still work if the first channel specified max(peak_bw of overlaps)
and others only zeroes?

> +     /*
> +      * Horizontal downscale needs a lower memory latency, which roughly
> +      * depends on the scaled width.  Trying to tune latency of a memory
> +      * client alone will likely result in a strong negative impact on
> +      * other memory clients, hence we will request a higher bandwidth
> +      * since latency depends on bandwidth.  This allows to prevent memory
> +      * FIFO underflows for a large plane downscales, meanwhile allowing
> +      * display to share bandwidth fairly with other memory clients.
> +      */
> +     if (src_w > dst_w)
> +             mul = (src_w - dst_w) * bpp / 2048 + 1;
> +     else
> +             mul = 1;
[...]

One point is unexplained yet: why is the multiplier proportional to a
*difference* between src and dst widths? Also, I would expect max (worst
case) is pixclock * read_size when src_w/dst_w >= read_size.

BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...

> +     /* average bandwidth in bytes/s */
> +     avg_bandwidth  = (bpp * src_w * src_h * mul + 7) / 8;
> +     avg_bandwidth *= drm_mode_vrefresh(&crtc_state->mode);
> +
> +     /* mode.clock in kHz, peak bandwidth in kbit/s */
> +     peak_bandwidth = crtc_state->mode.clock * bpp * mul;
[...]

Best Regards
Michał Mirosław

Reply via email to