tegra: dc: Support memory bandwidth management

Dmitry Osipenko Mon, 15 Mar 2021 11:39:56 -0700

15.03.2021 01:31, Michał Mirosław пишет:
> On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
>> Display controller (DC) performs isochronous memory transfers, and thus,
>> has a requirement for a minimum memory bandwidth that shall be fulfilled,
>> otherwise framebuffer data can't be fetched fast enough and this results
>> in a DC's data-FIFO underflow that follows by a visual corruption.
> [...]
>> +static unsigned long
>> +tegra_plane_overlap_mask(struct drm_crtc_state *state,
>> +                     const struct drm_plane_state *plane_state)
>> +{
>> +    const struct drm_plane_state *other_state;
>> +    const struct tegra_plane *tegra;
>> +    unsigned long overlap_mask = 0;
>> +    struct drm_plane *plane;
>> +    struct drm_rect rect;
>> +
>> +    if (!plane_state->visible || !plane_state->fb)
>> +            return 0;
>> +
>> +    drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
> [...]
>> +    /*
>> +     * Data-prefetch FIFO will easily help to overcome temporal memory
>> +     * pressure if other plane overlaps with the cursor plane.
>> +     */
>> +    if (tegra_plane_is_cursor(plane_state) && overlap_mask)
>> +            return 0;
>> +
>> +    return overlap_mask;
>> +}
> 
> Since for cursor plane this always returns 0, you could test
> tegra_plane_is_cursor() at the start of the function.


Yes, thanks.

>> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,
>> +                                             struct drm_atomic_state *state)
> [...]
>> +    /*
>> +     * For overlapping planes pixel's data is fetched for each plane at
>> +     * the same time, hence bandwidths are accumulated in this case.
>> +     * This needs to be taken into account for calculating total bandwidth
>> +     * consumed by all planes.
>> +     *
>> +     * Here we get the overlapping state of each plane, which is a
>> +     * bitmask of plane indices telling with what planes there is an
>> +     * overlap. Note that bitmask[plane] includes BIT(plane) in order
>> +     * to make further code nicer and simpler.
>> +     */
>> +    drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, 
>> new_state) {
>> +            tegra_state = to_const_tegra_plane_state(plane_state);
>> +            tegra = to_tegra_plane(plane);
>> +
>> +            if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))
>> +                    return -EINVAL;
>> +
>> +            plane_peak_bw[tegra->index] = 
>> tegra_state->peak_memory_bandwidth;
>> +            mask = tegra_plane_overlap_mask(new_state, plane_state);
>> +            overlap_mask[tegra->index] = mask;
>> +
>> +            if (hweight_long(mask) != 3)
>> +                    all_planes_overlap_simultaneously = false;
>> +    }
>> +
>> +    old_state = drm_atomic_get_old_crtc_state(state, crtc);
>> +    old_dc_state = to_const_dc_state(old_state);
>> +
>> +    /*
>> +     * Then we calculate maximum bandwidth of each plane state.
>> +     * The bandwidth includes the plane BW + BW of the "simultaneously"
>> +     * overlapping planes, where "simultaneously" means areas where DC
>> +     * fetches from the planes simultaneously during of scan-out process.
>> +     *
>> +     * For example, if plane A overlaps with planes B and C, but B and C
>> +     * don't overlap, then the peak bandwidth will be either in area where
>> +     * A-and-B or A-and-C planes overlap.
>> +     *
>> +     * The plane_peak_bw[] contains peak memory bandwidth values of
>> +     * each plane, this information is needed by interconnect provider
>> +     * in order to set up latency allowness based on the peak BW, see
>> +     * tegra_crtc_update_memory_bandwidth().
>> +     */
>> +    for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
>> +            overlap_bw = 0;
>> +
>> +            for_each_set_bit(k, &overlap_mask[i], 3) {
>> +                    if (k == i)
>> +                            continue;
>> +
>> +                    if (all_planes_overlap_simultaneously)
>> +                            overlap_bw += plane_peak_bw[k];
>> +                    else
>> +                            overlap_bw = max(overlap_bw, plane_peak_bw[k]);
>> +            }
>> +
>> +            new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
>> +
>> +            /*
>> +             * If plane's peak bandwidth changed (for example plane isn't
>> +             * overlapped anymore) and plane isn't in the atomic state,
>> +             * then add plane to the state in order to have the bandwidth
>> +             * updated.
>> +             */
>> +            if (old_dc_state->plane_peak_bw[i] !=
>> +                new_dc_state->plane_peak_bw[i]) {
>> +                    plane = tegra_crtc_get_plane_by_index(crtc, i);
>> +                    if (!plane)
>> +                            continue;
>> +
>> +                    plane_state = drm_atomic_get_plane_state(state, plane);
>> +                    if (IS_ERR(plane_state))
>> +                            return PTR_ERR(plane_state);
>> +            }
>> +    }
> [...]
> 
> Does it matter to which channel (plane) the peak bw is attached? Would
> it still work if the first channel specified max(peak_bw of overlaps)
> and others only zeroes?

The latency allowance will be configured incorrectly for the case of
zeroes by the memory driver, hence it does matter.

>> +    /*
>> +     * Horizontal downscale needs a lower memory latency, which roughly
>> +     * depends on the scaled width.  Trying to tune latency of a memory
>> +     * client alone will likely result in a strong negative impact on
>> +     * other memory clients, hence we will request a higher bandwidth
>> +     * since latency depends on bandwidth.  This allows to prevent memory
>> +     * FIFO underflows for a large plane downscales, meanwhile allowing
>> +     * display to share bandwidth fairly with other memory clients.
>> +     */
>> +    if (src_w > dst_w)
>> +            mul = (src_w - dst_w) * bpp / 2048 + 1;
>> +    else
>> +            mul = 1;
> [...]
> 
> One point is unexplained yet: why is the multiplier proportional to a
> *difference* between src and dst widths? Also, I would expect max (worst
> case) is pixclock * read_size when src_w/dst_w >= read_size.

IIRC, the difference gives a more adequate/practical result than the
proportion. Although, downstream driver uses proportion. I'll try to
revisit this for the next version of the patch.

> BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...

Indeed, and should be a bit nicer to use 'mul' in both cases.

Re: [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management

Reply via email to