Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-12 Thread Igor Torrente
Hi Pekka,

On Thu, Nov 11, 2021 at 11:37 AM Pekka Paalanen  wrote:
>
> On Thu, 11 Nov 2021 11:07:21 -0300
> Igor Torrente  wrote:
>
> > Hi Pekka,
> >
> > On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen  wrote:
> > >
> > > On Wed, 10 Nov 2021 13:56:54 -0300
> > > Igor Torrente  wrote:
> > >
> > > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen  
> > > > wrote:
> > > > >
> > > > > Hi Igor,
> > > > >
> > > > > again, that is a really nice speed-up. Unfortunately, I find the code
> > > > > rather messy and hard to follow. I hope my comments below help with
> > > > > re-designing it to be easier to understand.
> > > > >
> > > > >
> > > > > On Tue, 26 Oct 2021 08:34:06 -0300
> > > > > Igor Torrente  wrote:
> > > > >
> > > > > > Currently the blend function only accepts XRGB_ and ARGB_
> > > > > > as a color input.
> > > > > >
> > > > > > This patch refactors all the functions related to the plane 
> > > > > > composition
> > > > > > to overcome this limitation.
> > > > > >
> > > > > > Now the blend function receives a struct 
> > > > > > `vkms_pixel_composition_functions`
> > > > > > containing two handlers.
> > > > > >
> > > > > > One will generate a buffer of each line of the frame with the pixels
> > > > > > converted to ARGB16161616. And the other will take this line buffer,
> > > > > > do some computation on it, and store the pixels in the destination.
> > > > > >
> > > > > > Both the handlers have the same signature. They receive a pointer to
> > > > > > the pixels that will be processed(`pixels_addr`), the number of 
> > > > > > pixels
> > > > > > that will be treated(`length`), and the intermediate buffer of the 
> > > > > > size
> > > > > > of a frame line (`line_buffer`).
> > > > > >
> > > > > > The first function has been totally described previously.
> > > > >
> > > > > What does this sentence mean?
> > > >
> > > > In the sentence "One will generate...", I give an overview of the two 
> > > > types of
> > > > handlers. And the overview of the first handler describes the full 
> > > > behavior of
> > > > it.
> > > >
> > > > But it doesn't look clear enough, I will improve it in the future.
> > > >
> > > > >
> > > > > >
> > > > > > The second is more interesting, as it has to perform two roles 
> > > > > > depending
> > > > > > on where it is called in the code.
> > > > > >
> > > > > > The first is to convert(if necessary) the data received in the
> > > > > > `line_buffer` and write in the memory pointed by `pixels_addr`.
> > > > > >
> > > > > > The second role is to perform the `alpha_blend`. So, it takes the 
> > > > > > pixels
> > > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and 
> > > > > > stores
> > > > > > the result back to the `pixels_addr`.
> > > > > >
> > > > > > The per-line implementation was chosen for performance reasons.
> > > > > > The per-pixel functions were having performance issues due to 
> > > > > > indirect
> > > > > > function call overhead.
> > > > > >
> > > > > > The per-line code trades off memory for execution time. The 
> > > > > > `line_buffer`
> > > > > > allows us to diminish the number of function calls.
> > > > > >
> > > > > > Results in the IGT test `kms_cursor_crc`:
> > > > > >
> > > > > > | Frametime   |
> > > > > > |:---:|:-:|:--:|::|
> > > > > > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > > > > > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > > > > > | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> > > > > >
> > > > > > Reported-by: kernel test robot 
> > > > > > Signed-off-by: Igor Torrente 
> > > > > > ---
> > > > > > V2: Improves the performance drastically, by perfoming the 
> > > > > > operations
> > > > > > per-line and not per-pixel(Pekka Paalanen).
> > > > > > Minor improvements(Pekka Paalanen).
> > > > > > ---
> > > > > >  drivers/gpu/drm/vkms/vkms_composer.c | 321 
> > > > > > ---
> > > > > >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
> > > > > >  2 files changed, 342 insertions(+), 134 deletions(-)
> > > > > >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> > > > > > b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > > index 383ca657ddf7..69fe3a89bdc9 100644
> > > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>
> ...
>
> > > > > > +struct vkms_pixel_composition_functions {
> > > > > > + void (*get_src_line)(void *pixels_addr, int length, u64 
> > > > > > *line_buffer);
> > > > > > + void (*set_output_line)(void *pixels_addr, int length, u64 
> > > > > > *line_buffer);
> > > > >
> > > > > I would be a little more comfortable if instead of u64 *line_buffer 
> > > > > you
> > > > > would have something like
> > > > >
> > > > > struct line_buffer {
> > > > > u16 *row;
> > > > > size_t nelem;
> > > > > }
> >

Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-11 Thread Pekka Paalanen
On Thu, 11 Nov 2021 11:07:21 -0300
Igor Torrente  wrote:

> Hi Pekka,
> 
> On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen  wrote:
> >
> > On Wed, 10 Nov 2021 13:56:54 -0300
> > Igor Torrente  wrote:
> >  
> > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen  
> > > wrote:  
> > > >
> > > > Hi Igor,
> > > >
> > > > again, that is a really nice speed-up. Unfortunately, I find the code
> > > > rather messy and hard to follow. I hope my comments below help with
> > > > re-designing it to be easier to understand.
> > > >
> > > >
> > > > On Tue, 26 Oct 2021 08:34:06 -0300
> > > > Igor Torrente  wrote:
> > > >  
> > > > > Currently the blend function only accepts XRGB_ and ARGB_
> > > > > as a color input.
> > > > >
> > > > > This patch refactors all the functions related to the plane 
> > > > > composition
> > > > > to overcome this limitation.
> > > > >
> > > > > Now the blend function receives a struct 
> > > > > `vkms_pixel_composition_functions`
> > > > > containing two handlers.
> > > > >
> > > > > One will generate a buffer of each line of the frame with the pixels
> > > > > converted to ARGB16161616. And the other will take this line buffer,
> > > > > do some computation on it, and store the pixels in the destination.
> > > > >
> > > > > Both the handlers have the same signature. They receive a pointer to
> > > > > the pixels that will be processed(`pixels_addr`), the number of pixels
> > > > > that will be treated(`length`), and the intermediate buffer of the 
> > > > > size
> > > > > of a frame line (`line_buffer`).
> > > > >
> > > > > The first function has been totally described previously.  
> > > >
> > > > What does this sentence mean?  
> > >
> > > In the sentence "One will generate...", I give an overview of the two 
> > > types of
> > > handlers. And the overview of the first handler describes the full 
> > > behavior of
> > > it.
> > >
> > > But it doesn't look clear enough, I will improve it in the future.
> > >  
> > > >  
> > > > >
> > > > > The second is more interesting, as it has to perform two roles 
> > > > > depending
> > > > > on where it is called in the code.
> > > > >
> > > > > The first is to convert(if necessary) the data received in the
> > > > > `line_buffer` and write in the memory pointed by `pixels_addr`.
> > > > >
> > > > > The second role is to perform the `alpha_blend`. So, it takes the 
> > > > > pixels
> > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> > > > > the result back to the `pixels_addr`.
> > > > >
> > > > > The per-line implementation was chosen for performance reasons.
> > > > > The per-pixel functions were having performance issues due to indirect
> > > > > function call overhead.
> > > > >
> > > > > The per-line code trades off memory for execution time. The 
> > > > > `line_buffer`
> > > > > allows us to diminish the number of function calls.
> > > > >
> > > > > Results in the IGT test `kms_cursor_crc`:
> > > > >
> > > > > | Frametime   |
> > > > > |:---:|:-:|:--:|::|
> > > > > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > > > > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > > > > | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> > > > >
> > > > > Reported-by: kernel test robot 
> > > > > Signed-off-by: Igor Torrente 
> > > > > ---
> > > > > V2: Improves the performance drastically, by perfoming the operations
> > > > > per-line and not per-pixel(Pekka Paalanen).
> > > > > Minor improvements(Pekka Paalanen).
> > > > > ---
> > > > >  drivers/gpu/drm/vkms/vkms_composer.c | 321 
> > > > > ---
> > > > >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
> > > > >  2 files changed, 342 insertions(+), 134 deletions(-)
> > > > >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > > >
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> > > > > b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > index 383ca657ddf7..69fe3a89bdc9 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c

...

> > > > > +struct vkms_pixel_composition_functions {
> > > > > + void (*get_src_line)(void *pixels_addr, int length, u64 
> > > > > *line_buffer);
> > > > > + void (*set_output_line)(void *pixels_addr, int length, u64 
> > > > > *line_buffer);  
> > > >
> > > > I would be a little more comfortable if instead of u64 *line_buffer you
> > > > would have something like
> > > >
> > > > struct line_buffer {
> > > > u16 *row;
> > > > size_t nelem;
> > > > }
> > > >
> > > > so that the functions to be plugged into these function pointers could
> > > > assert that you do not accidentally overflow the array (which would
> > > > imply a code bug in kernel).
> > > >
> > > > One could perhaps go even for:
> > > >
> > > > struct line_pixel {
> > > > u16 r, g, b, a;
> > > > };
> > > >
> > > > struct line_b

Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-11 Thread Igor Torrente
Hi Pekka,

On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen  wrote:
>
> On Wed, 10 Nov 2021 13:56:54 -0300
> Igor Torrente  wrote:
>
> > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen  wrote:
> > >
> > > Hi Igor,
> > >
> > > again, that is a really nice speed-up. Unfortunately, I find the code
> > > rather messy and hard to follow. I hope my comments below help with
> > > re-designing it to be easier to understand.
> > >
> > >
> > > On Tue, 26 Oct 2021 08:34:06 -0300
> > > Igor Torrente  wrote:
> > >
> > > > Currently the blend function only accepts XRGB_ and ARGB_
> > > > as a color input.
> > > >
> > > > This patch refactors all the functions related to the plane composition
> > > > to overcome this limitation.
> > > >
> > > > Now the blend function receives a struct 
> > > > `vkms_pixel_composition_functions`
> > > > containing two handlers.
> > > >
> > > > One will generate a buffer of each line of the frame with the pixels
> > > > converted to ARGB16161616. And the other will take this line buffer,
> > > > do some computation on it, and store the pixels in the destination.
> > > >
> > > > Both the handlers have the same signature. They receive a pointer to
> > > > the pixels that will be processed(`pixels_addr`), the number of pixels
> > > > that will be treated(`length`), and the intermediate buffer of the size
> > > > of a frame line (`line_buffer`).
> > > >
> > > > The first function has been totally described previously.
> > >
> > > What does this sentence mean?
> >
> > In the sentence "One will generate...", I give an overview of the two types 
> > of
> > handlers. And the overview of the first handler describes the full behavior 
> > of
> > it.
> >
> > But it doesn't look clear enough, I will improve it in the future.
> >
> > >
> > > >
> > > > The second is more interesting, as it has to perform two roles depending
> > > > on where it is called in the code.
> > > >
> > > > The first is to convert(if necessary) the data received in the
> > > > `line_buffer` and write in the memory pointed by `pixels_addr`.
> > > >
> > > > The second role is to perform the `alpha_blend`. So, it takes the pixels
> > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> > > > the result back to the `pixels_addr`.
> > > >
> > > > The per-line implementation was chosen for performance reasons.
> > > > The per-pixel functions were having performance issues due to indirect
> > > > function call overhead.
> > > >
> > > > The per-line code trades off memory for execution time. The 
> > > > `line_buffer`
> > > > allows us to diminish the number of function calls.
> > > >
> > > > Results in the IGT test `kms_cursor_crc`:
> > > >
> > > > | Frametime   |
> > > > |:---:|:-:|:--:|::|
> > > > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > > > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > > > | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> > > >
> > > > Reported-by: kernel test robot 
> > > > Signed-off-by: Igor Torrente 
> > > > ---
> > > > V2: Improves the performance drastically, by perfoming the operations
> > > > per-line and not per-pixel(Pekka Paalanen).
> > > > Minor improvements(Pekka Paalanen).
> > > > ---
> > > >  drivers/gpu/drm/vkms/vkms_composer.c | 321 ---
> > > >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
> > > >  2 files changed, 342 insertions(+), 134 deletions(-)
> > > >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > >
> > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> > > > b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > index 383ca657ddf7..69fe3a89bdc9 100644
> > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > @@ -9,18 +9,26 @@
> > > >  #include 
> > > >
> > > >  #include "vkms_drv.h"
> > > > -
> > > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > > -  const struct vkms_composer *composer)
> > > > -{
> > > > - u32 pixel;
> > > > - int src_offset = composer->offset + (y * composer->pitch)
> > > > -   + (x * composer->cpp);
> > > > -
> > > > - pixel = *(u32 *)&buffer[src_offset];
> > > > -
> > > > - return pixel;
> > > > -}
> > > > +#include "vkms_formats.h"
> > > > +
> > > > +#define get_output_vkms_composer(buffer_pointer, composer)   \
> > > > + ((struct vkms_composer) {   \
> > > > + .fb = &(struct drm_framebuffer) {   \
> > > > + .format = &(struct drm_format_info) {   \
> > > > + .format = DRM_FORMAT_ARGB16161616,  \
> > > > + },  \
> > >
> > > Is that really how one can initialize a drm_format_info? Does that
> > > struct not have a lot mo

Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-11 Thread Pekka Paalanen
On Wed, 10 Nov 2021 13:56:54 -0300
Igor Torrente  wrote:

> On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen  wrote:
> >
> > Hi Igor,
> >
> > again, that is a really nice speed-up. Unfortunately, I find the code
> > rather messy and hard to follow. I hope my comments below help with
> > re-designing it to be easier to understand.
> >
> >
> > On Tue, 26 Oct 2021 08:34:06 -0300
> > Igor Torrente  wrote:
> >  
> > > Currently the blend function only accepts XRGB_ and ARGB_
> > > as a color input.
> > >
> > > This patch refactors all the functions related to the plane composition
> > > to overcome this limitation.
> > >
> > > Now the blend function receives a struct 
> > > `vkms_pixel_composition_functions`
> > > containing two handlers.
> > >
> > > One will generate a buffer of each line of the frame with the pixels
> > > converted to ARGB16161616. And the other will take this line buffer,
> > > do some computation on it, and store the pixels in the destination.
> > >
> > > Both the handlers have the same signature. They receive a pointer to
> > > the pixels that will be processed(`pixels_addr`), the number of pixels
> > > that will be treated(`length`), and the intermediate buffer of the size
> > > of a frame line (`line_buffer`).
> > >
> > > The first function has been totally described previously.  
> >
> > What does this sentence mean?  
> 
> In the sentence "One will generate...", I give an overview of the two types of
> handlers. And the overview of the first handler describes the full behavior of
> it.
> 
> But it doesn't look clear enough, I will improve it in the future.
> 
> >  
> > >
> > > The second is more interesting, as it has to perform two roles depending
> > > on where it is called in the code.
> > >
> > > The first is to convert(if necessary) the data received in the
> > > `line_buffer` and write in the memory pointed by `pixels_addr`.
> > >
> > > The second role is to perform the `alpha_blend`. So, it takes the pixels
> > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> > > the result back to the `pixels_addr`.
> > >
> > > The per-line implementation was chosen for performance reasons.
> > > The per-pixel functions were having performance issues due to indirect
> > > function call overhead.
> > >
> > > The per-line code trades off memory for execution time. The `line_buffer`
> > > allows us to diminish the number of function calls.
> > >
> > > Results in the IGT test `kms_cursor_crc`:
> > >
> > > | Frametime   |
> > > |:---:|:-:|:--:|::|
> > > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > > | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> > >
> > > Reported-by: kernel test robot 
> > > Signed-off-by: Igor Torrente 
> > > ---
> > > V2: Improves the performance drastically, by perfoming the operations
> > > per-line and not per-pixel(Pekka Paalanen).
> > > Minor improvements(Pekka Paalanen).
> > > ---
> > >  drivers/gpu/drm/vkms/vkms_composer.c | 321 ---
> > >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
> > >  2 files changed, 342 insertions(+), 134 deletions(-)
> > >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > >
> > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> > > b/drivers/gpu/drm/vkms/vkms_composer.c
> > > index 383ca657ddf7..69fe3a89bdc9 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > @@ -9,18 +9,26 @@
> > >  #include 
> > >
> > >  #include "vkms_drv.h"
> > > -
> > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > -  const struct vkms_composer *composer)
> > > -{
> > > - u32 pixel;
> > > - int src_offset = composer->offset + (y * composer->pitch)
> > > -   + (x * composer->cpp);
> > > -
> > > - pixel = *(u32 *)&buffer[src_offset];
> > > -
> > > - return pixel;
> > > -}
> > > +#include "vkms_formats.h"
> > > +
> > > +#define get_output_vkms_composer(buffer_pointer, composer)   \
> > > + ((struct vkms_composer) {   \
> > > + .fb = &(struct drm_framebuffer) {   \
> > > + .format = &(struct drm_format_info) {   \
> > > + .format = DRM_FORMAT_ARGB16161616,  \
> > > + },  \  
> >
> > Is that really how one can initialize a drm_format_info? Does that
> > struct not have a lot more fields? Shouldn't you call a function to
> > look up the proper struct with all fields populated?  
> 
> I did this macro to just fill the necessary fields, and add more of them
> as necessary.
> 
> I was implementing something very similar to the algorithm that
> you described below. So this macro 

Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-10 Thread Igor Torrente
On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen  wrote:
>
> Hi Igor,
>
> again, that is a really nice speed-up. Unfortunately, I find the code
> rather messy and hard to follow. I hope my comments below help with
> re-designing it to be easier to understand.
>
>
> On Tue, 26 Oct 2021 08:34:06 -0300
> Igor Torrente  wrote:
>
> > Currently the blend function only accepts XRGB_ and ARGB_
> > as a color input.
> >
> > This patch refactors all the functions related to the plane composition
> > to overcome this limitation.
> >
> > Now the blend function receives a struct `vkms_pixel_composition_functions`
> > containing two handlers.
> >
> > One will generate a buffer of each line of the frame with the pixels
> > converted to ARGB16161616. And the other will take this line buffer,
> > do some computation on it, and store the pixels in the destination.
> >
> > Both the handlers have the same signature. They receive a pointer to
> > the pixels that will be processed(`pixels_addr`), the number of pixels
> > that will be treated(`length`), and the intermediate buffer of the size
> > of a frame line (`line_buffer`).
> >
> > The first function has been totally described previously.
>
> What does this sentence mean?

In the sentence "One will generate...", I give an overview of the two types of
handlers. And the overview of the first handler describes the full behavior of
it.

But it doesn't look clear enough, I will improve it in the future.

>
> >
> > The second is more interesting, as it has to perform two roles depending
> > on where it is called in the code.
> >
> > The first is to convert(if necessary) the data received in the
> > `line_buffer` and write in the memory pointed by `pixels_addr`.
> >
> > The second role is to perform the `alpha_blend`. So, it takes the pixels
> > in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> > the result back to the `pixels_addr`.
> >
> > The per-line implementation was chosen for performance reasons.
> > The per-pixel functions were having performance issues due to indirect
> > function call overhead.
> >
> > The per-line code trades off memory for execution time. The `line_buffer`
> > allows us to diminish the number of function calls.
> >
> > Results in the IGT test `kms_cursor_crc`:
> >
> > | Frametime   |
> > |:---:|:-:|:--:|::|
> > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> >
> > Reported-by: kernel test robot 
> > Signed-off-by: Igor Torrente 
> > ---
> > V2: Improves the performance drastically, by perfoming the operations
> > per-line and not per-pixel(Pekka Paalanen).
> > Minor improvements(Pekka Paalanen).
> > ---
> >  drivers/gpu/drm/vkms/vkms_composer.c | 321 ---
> >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
> >  2 files changed, 342 insertions(+), 134 deletions(-)
> >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> >
> > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> > b/drivers/gpu/drm/vkms/vkms_composer.c
> > index 383ca657ddf7..69fe3a89bdc9 100644
> > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > @@ -9,18 +9,26 @@
> >  #include 
> >
> >  #include "vkms_drv.h"
> > -
> > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > -  const struct vkms_composer *composer)
> > -{
> > - u32 pixel;
> > - int src_offset = composer->offset + (y * composer->pitch)
> > -   + (x * composer->cpp);
> > -
> > - pixel = *(u32 *)&buffer[src_offset];
> > -
> > - return pixel;
> > -}
> > +#include "vkms_formats.h"
> > +
> > +#define get_output_vkms_composer(buffer_pointer, composer)   \
> > + ((struct vkms_composer) {   \
> > + .fb = &(struct drm_framebuffer) {   \
> > + .format = &(struct drm_format_info) {   \
> > + .format = DRM_FORMAT_ARGB16161616,  \
> > + },  \
>
> Is that really how one can initialize a drm_format_info? Does that
> struct not have a lot more fields? Shouldn't you call a function to
> look up the proper struct with all fields populated?

I did this macro to just fill the necessary fields, and add more of them
as necessary.

I was implementing something very similar to the algorithm that
you described below. So this macro will not exist in the next version.

>
> > + },  \
> > + .map[0].vaddr = (buffer_pointer),   \
> > + .src = (composer)->src, \
> > + .dst = (composer)->dst, 

Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats

2021-11-09 Thread Pekka Paalanen
Hi Igor,

again, that is a really nice speed-up. Unfortunately, I find the code
rather messy and hard to follow. I hope my comments below help with
re-designing it to be easier to understand.


On Tue, 26 Oct 2021 08:34:06 -0300
Igor Torrente  wrote:

> Currently the blend function only accepts XRGB_ and ARGB_
> as a color input.
> 
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
> 
> Now the blend function receives a struct `vkms_pixel_composition_functions`
> containing two handlers.
> 
> One will generate a buffer of each line of the frame with the pixels
> converted to ARGB16161616. And the other will take this line buffer,
> do some computation on it, and store the pixels in the destination.
> 
> Both the handlers have the same signature. They receive a pointer to
> the pixels that will be processed(`pixels_addr`), the number of pixels
> that will be treated(`length`), and the intermediate buffer of the size
> of a frame line (`line_buffer`).
> 
> The first function has been totally described previously.

What does this sentence mean?

> 
> The second is more interesting, as it has to perform two roles depending
> on where it is called in the code.
> 
> The first is to convert(if necessary) the data received in the
> `line_buffer` and write in the memory pointed by `pixels_addr`.
> 
> The second role is to perform the `alpha_blend`. So, it takes the pixels
> in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> the result back to the `pixels_addr`.
> 
> The per-line implementation was chosen for performance reasons.
> The per-pixel functions were having performance issues due to indirect
> function call overhead.
> 
> The per-line code trades off memory for execution time. The `line_buffer`
> allows us to diminish the number of function calls.
> 
> Results in the IGT test `kms_cursor_crc`:
> 
> | Frametime   |
> |:---:|:-:|:--:|::|
> |  implmentation  |  Current  |  Per-pixel | Per-line |
> | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> | Average |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> 
> Reported-by: kernel test robot 
> Signed-off-by: Igor Torrente 
> ---
> V2: Improves the performance drastically, by perfoming the operations
> per-line and not per-pixel(Pekka Paalanen).
> Minor improvements(Pekka Paalanen).
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 321 ---
>  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +
>  2 files changed, 342 insertions(+), 134 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> b/drivers/gpu/drm/vkms/vkms_composer.c
> index 383ca657ddf7..69fe3a89bdc9 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -9,18 +9,26 @@
>  #include 
>  
>  #include "vkms_drv.h"
> -
> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -  const struct vkms_composer *composer)
> -{
> - u32 pixel;
> - int src_offset = composer->offset + (y * composer->pitch)
> -   + (x * composer->cpp);
> -
> - pixel = *(u32 *)&buffer[src_offset];
> -
> - return pixel;
> -}
> +#include "vkms_formats.h"
> +
> +#define get_output_vkms_composer(buffer_pointer, composer)   \
> + ((struct vkms_composer) {   \
> + .fb = &(struct drm_framebuffer) {   \
> + .format = &(struct drm_format_info) {   \
> + .format = DRM_FORMAT_ARGB16161616,  \
> + },  \

Is that really how one can initialize a drm_format_info? Does that
struct not have a lot more fields? Shouldn't you call a function to
look up the proper struct with all fields populated?

> + },  \
> + .map[0].vaddr = (buffer_pointer),   \
> + .src = (composer)->src, \
> + .dst = (composer)->dst, \
> + .cpp = sizeof(u64), \
> + .pitch = drm_rect_width(&(composer)->dst) * sizeof(u64) \
> + })

Why is this a macro rather than a function?

> +
> +struct vkms_pixel_composition_functions {
> + void (*get_src_line)(void *pixels_addr, int length, u64 *line_buffer);
> + void (*set_output_line)(void *pixels_addr, int length, u64 
> *line_buffer);

I would be a little more comfortable if instead of u64 *line_buffer you
would have something like

struct line_buffer {
u16 *row;
size_t nelem;
}

so that the functions to be plugged into these function pointers could
assert that you do not