Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
Hi Pekka, On Thu, Nov 11, 2021 at 11:37 AM Pekka Paalanen wrote: > > On Thu, 11 Nov 2021 11:07:21 -0300 > Igor Torrente wrote: > > > Hi Pekka, > > > > On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen wrote: > > > > > > On Wed, 10 Nov 2021 13:56:54 -0300 > > > Igor Torrente wrote: > > > > > > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen > > > > wrote: > > > > > > > > > > Hi Igor, > > > > > > > > > > again, that is a really nice speed-up. Unfortunately, I find the code > > > > > rather messy and hard to follow. I hope my comments below help with > > > > > re-designing it to be easier to understand. > > > > > > > > > > > > > > > On Tue, 26 Oct 2021 08:34:06 -0300 > > > > > Igor Torrente wrote: > > > > > > > > > > > Currently the blend function only accepts XRGB_ and ARGB_ > > > > > > as a color input. > > > > > > > > > > > > This patch refactors all the functions related to the plane > > > > > > composition > > > > > > to overcome this limitation. > > > > > > > > > > > > Now the blend function receives a struct > > > > > > `vkms_pixel_composition_functions` > > > > > > containing two handlers. > > > > > > > > > > > > One will generate a buffer of each line of the frame with the pixels > > > > > > converted to ARGB16161616. And the other will take this line buffer, > > > > > > do some computation on it, and store the pixels in the destination. > > > > > > > > > > > > Both the handlers have the same signature. They receive a pointer to > > > > > > the pixels that will be processed(`pixels_addr`), the number of > > > > > > pixels > > > > > > that will be treated(`length`), and the intermediate buffer of the > > > > > > size > > > > > > of a frame line (`line_buffer`). > > > > > > > > > > > > The first function has been totally described previously. > > > > > > > > > > What does this sentence mean? > > > > > > > > In the sentence "One will generate...", I give an overview of the two > > > > types of > > > > handlers. And the overview of the first handler describes the full > > > > behavior of > > > > it. > > > > > > > > But it doesn't look clear enough, I will improve it in the future. > > > > > > > > > > > > > > > > > > > > > The second is more interesting, as it has to perform two roles > > > > > > depending > > > > > > on where it is called in the code. > > > > > > > > > > > > The first is to convert(if necessary) the data received in the > > > > > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > > > > > > > > > The second role is to perform the `alpha_blend`. So, it takes the > > > > > > pixels > > > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and > > > > > > stores > > > > > > the result back to the `pixels_addr`. > > > > > > > > > > > > The per-line implementation was chosen for performance reasons. > > > > > > The per-pixel functions were having performance issues due to > > > > > > indirect > > > > > > function call overhead. > > > > > > > > > > > > The per-line code trades off memory for execution time. The > > > > > > `line_buffer` > > > > > > allows us to diminish the number of function calls. > > > > > > > > > > > > Results in the IGT test `kms_cursor_crc`: > > > > > > > > > > > > | Frametime | > > > > > > |:---:|:-:|:--:|::| > > > > > > | implmentation | Current | Per-pixel | Per-line | > > > > > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > > > > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > > > > > > > > > Reported-by: kernel test robot > > > > > > Signed-off-by: Igor Torrente > > > > > > --- > > > > > > V2: Improves the performance drastically, by perfoming the > > > > > > operations > > > > > > per-line and not per-pixel(Pekka Paalanen). > > > > > > Minor improvements(Pekka Paalanen). > > > > > > --- > > > > > > drivers/gpu/drm/vkms/vkms_composer.c | 321 > > > > > > --- > > > > > > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > > > > > > 2 files changed, 342 insertions(+), 134 deletions(-) > > > > > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > > > > > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > > > > > > b/drivers/gpu/drm/vkms/vkms_composer.c > > > > > > index 383ca657ddf7..69fe3a89bdc9 100644 > > > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > > ... > > > > > > > +struct vkms_pixel_composition_functions { > > > > > > + void (*get_src_line)(void *pixels_addr, int length, u64 > > > > > > *line_buffer); > > > > > > + void (*set_output_line)(void *pixels_addr, int length, u64 > > > > > > *line_buffer); > > > > > > > > > > I would be a little more comfortable if instead of u64 *line_buffer > > > > > you > > > > > would have something like > > > > > > > > > > struct line_buffer { > > > > > u16 *row; > > > > > size_t nelem; > > > > > } > >
Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
On Thu, 11 Nov 2021 11:07:21 -0300 Igor Torrente wrote: > Hi Pekka, > > On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen wrote: > > > > On Wed, 10 Nov 2021 13:56:54 -0300 > > Igor Torrente wrote: > > > > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen > > > wrote: > > > > > > > > Hi Igor, > > > > > > > > again, that is a really nice speed-up. Unfortunately, I find the code > > > > rather messy and hard to follow. I hope my comments below help with > > > > re-designing it to be easier to understand. > > > > > > > > > > > > On Tue, 26 Oct 2021 08:34:06 -0300 > > > > Igor Torrente wrote: > > > > > > > > > Currently the blend function only accepts XRGB_ and ARGB_ > > > > > as a color input. > > > > > > > > > > This patch refactors all the functions related to the plane > > > > > composition > > > > > to overcome this limitation. > > > > > > > > > > Now the blend function receives a struct > > > > > `vkms_pixel_composition_functions` > > > > > containing two handlers. > > > > > > > > > > One will generate a buffer of each line of the frame with the pixels > > > > > converted to ARGB16161616. And the other will take this line buffer, > > > > > do some computation on it, and store the pixels in the destination. > > > > > > > > > > Both the handlers have the same signature. They receive a pointer to > > > > > the pixels that will be processed(`pixels_addr`), the number of pixels > > > > > that will be treated(`length`), and the intermediate buffer of the > > > > > size > > > > > of a frame line (`line_buffer`). > > > > > > > > > > The first function has been totally described previously. > > > > > > > > What does this sentence mean? > > > > > > In the sentence "One will generate...", I give an overview of the two > > > types of > > > handlers. And the overview of the first handler describes the full > > > behavior of > > > it. > > > > > > But it doesn't look clear enough, I will improve it in the future. > > > > > > > > > > > > > > > > > The second is more interesting, as it has to perform two roles > > > > > depending > > > > > on where it is called in the code. > > > > > > > > > > The first is to convert(if necessary) the data received in the > > > > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > > > > > > > The second role is to perform the `alpha_blend`. So, it takes the > > > > > pixels > > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > > > > > the result back to the `pixels_addr`. > > > > > > > > > > The per-line implementation was chosen for performance reasons. > > > > > The per-pixel functions were having performance issues due to indirect > > > > > function call overhead. > > > > > > > > > > The per-line code trades off memory for execution time. The > > > > > `line_buffer` > > > > > allows us to diminish the number of function calls. > > > > > > > > > > Results in the IGT test `kms_cursor_crc`: > > > > > > > > > > | Frametime | > > > > > |:---:|:-:|:--:|::| > > > > > | implmentation | Current | Per-pixel | Per-line | > > > > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > > > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > > > > > > > Reported-by: kernel test robot > > > > > Signed-off-by: Igor Torrente > > > > > --- > > > > > V2: Improves the performance drastically, by perfoming the operations > > > > > per-line and not per-pixel(Pekka Paalanen). > > > > > Minor improvements(Pekka Paalanen). > > > > > --- > > > > > drivers/gpu/drm/vkms/vkms_composer.c | 321 > > > > > --- > > > > > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > > > > > 2 files changed, 342 insertions(+), 134 deletions(-) > > > > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > > > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > > > > > b/drivers/gpu/drm/vkms/vkms_composer.c > > > > > index 383ca657ddf7..69fe3a89bdc9 100644 > > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c ... > > > > > +struct vkms_pixel_composition_functions { > > > > > + void (*get_src_line)(void *pixels_addr, int length, u64 > > > > > *line_buffer); > > > > > + void (*set_output_line)(void *pixels_addr, int length, u64 > > > > > *line_buffer); > > > > > > > > I would be a little more comfortable if instead of u64 *line_buffer you > > > > would have something like > > > > > > > > struct line_buffer { > > > > u16 *row; > > > > size_t nelem; > > > > } > > > > > > > > so that the functions to be plugged into these function pointers could > > > > assert that you do not accidentally overflow the array (which would > > > > imply a code bug in kernel). > > > > > > > > One could perhaps go even for: > > > > > > > > struct line_pixel { > > > > u16 r, g, b, a; > > > > }; > > > > > > > > struct line_b
Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
Hi Pekka, On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen wrote: > > On Wed, 10 Nov 2021 13:56:54 -0300 > Igor Torrente wrote: > > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen wrote: > > > > > > Hi Igor, > > > > > > again, that is a really nice speed-up. Unfortunately, I find the code > > > rather messy and hard to follow. I hope my comments below help with > > > re-designing it to be easier to understand. > > > > > > > > > On Tue, 26 Oct 2021 08:34:06 -0300 > > > Igor Torrente wrote: > > > > > > > Currently the blend function only accepts XRGB_ and ARGB_ > > > > as a color input. > > > > > > > > This patch refactors all the functions related to the plane composition > > > > to overcome this limitation. > > > > > > > > Now the blend function receives a struct > > > > `vkms_pixel_composition_functions` > > > > containing two handlers. > > > > > > > > One will generate a buffer of each line of the frame with the pixels > > > > converted to ARGB16161616. And the other will take this line buffer, > > > > do some computation on it, and store the pixels in the destination. > > > > > > > > Both the handlers have the same signature. They receive a pointer to > > > > the pixels that will be processed(`pixels_addr`), the number of pixels > > > > that will be treated(`length`), and the intermediate buffer of the size > > > > of a frame line (`line_buffer`). > > > > > > > > The first function has been totally described previously. > > > > > > What does this sentence mean? > > > > In the sentence "One will generate...", I give an overview of the two types > > of > > handlers. And the overview of the first handler describes the full behavior > > of > > it. > > > > But it doesn't look clear enough, I will improve it in the future. > > > > > > > > > > > > > The second is more interesting, as it has to perform two roles depending > > > > on where it is called in the code. > > > > > > > > The first is to convert(if necessary) the data received in the > > > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > > > > > The second role is to perform the `alpha_blend`. So, it takes the pixels > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > > > > the result back to the `pixels_addr`. > > > > > > > > The per-line implementation was chosen for performance reasons. > > > > The per-pixel functions were having performance issues due to indirect > > > > function call overhead. > > > > > > > > The per-line code trades off memory for execution time. The > > > > `line_buffer` > > > > allows us to diminish the number of function calls. > > > > > > > > Results in the IGT test `kms_cursor_crc`: > > > > > > > > | Frametime | > > > > |:---:|:-:|:--:|::| > > > > | implmentation | Current | Per-pixel | Per-line | > > > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > > > > > Reported-by: kernel test robot > > > > Signed-off-by: Igor Torrente > > > > --- > > > > V2: Improves the performance drastically, by perfoming the operations > > > > per-line and not per-pixel(Pekka Paalanen). > > > > Minor improvements(Pekka Paalanen). > > > > --- > > > > drivers/gpu/drm/vkms/vkms_composer.c | 321 --- > > > > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > > > > 2 files changed, 342 insertions(+), 134 deletions(-) > > > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > > > > b/drivers/gpu/drm/vkms/vkms_composer.c > > > > index 383ca657ddf7..69fe3a89bdc9 100644 > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > > > > @@ -9,18 +9,26 @@ > > > > #include > > > > > > > > #include "vkms_drv.h" > > > > - > > > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer, > > > > - const struct vkms_composer *composer) > > > > -{ > > > > - u32 pixel; > > > > - int src_offset = composer->offset + (y * composer->pitch) > > > > - + (x * composer->cpp); > > > > - > > > > - pixel = *(u32 *)&buffer[src_offset]; > > > > - > > > > - return pixel; > > > > -} > > > > +#include "vkms_formats.h" > > > > + > > > > +#define get_output_vkms_composer(buffer_pointer, composer) \ > > > > + ((struct vkms_composer) { \ > > > > + .fb = &(struct drm_framebuffer) { \ > > > > + .format = &(struct drm_format_info) { \ > > > > + .format = DRM_FORMAT_ARGB16161616, \ > > > > + }, \ > > > > > > Is that really how one can initialize a drm_format_info? Does that > > > struct not have a lot mo
Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
On Wed, 10 Nov 2021 13:56:54 -0300 Igor Torrente wrote: > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen wrote: > > > > Hi Igor, > > > > again, that is a really nice speed-up. Unfortunately, I find the code > > rather messy and hard to follow. I hope my comments below help with > > re-designing it to be easier to understand. > > > > > > On Tue, 26 Oct 2021 08:34:06 -0300 > > Igor Torrente wrote: > > > > > Currently the blend function only accepts XRGB_ and ARGB_ > > > as a color input. > > > > > > This patch refactors all the functions related to the plane composition > > > to overcome this limitation. > > > > > > Now the blend function receives a struct > > > `vkms_pixel_composition_functions` > > > containing two handlers. > > > > > > One will generate a buffer of each line of the frame with the pixels > > > converted to ARGB16161616. And the other will take this line buffer, > > > do some computation on it, and store the pixels in the destination. > > > > > > Both the handlers have the same signature. They receive a pointer to > > > the pixels that will be processed(`pixels_addr`), the number of pixels > > > that will be treated(`length`), and the intermediate buffer of the size > > > of a frame line (`line_buffer`). > > > > > > The first function has been totally described previously. > > > > What does this sentence mean? > > In the sentence "One will generate...", I give an overview of the two types of > handlers. And the overview of the first handler describes the full behavior of > it. > > But it doesn't look clear enough, I will improve it in the future. > > > > > > > > > The second is more interesting, as it has to perform two roles depending > > > on where it is called in the code. > > > > > > The first is to convert(if necessary) the data received in the > > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > > > The second role is to perform the `alpha_blend`. So, it takes the pixels > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > > > the result back to the `pixels_addr`. > > > > > > The per-line implementation was chosen for performance reasons. > > > The per-pixel functions were having performance issues due to indirect > > > function call overhead. > > > > > > The per-line code trades off memory for execution time. The `line_buffer` > > > allows us to diminish the number of function calls. > > > > > > Results in the IGT test `kms_cursor_crc`: > > > > > > | Frametime | > > > |:---:|:-:|:--:|::| > > > | implmentation | Current | Per-pixel | Per-line | > > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > > > Reported-by: kernel test robot > > > Signed-off-by: Igor Torrente > > > --- > > > V2: Improves the performance drastically, by perfoming the operations > > > per-line and not per-pixel(Pekka Paalanen). > > > Minor improvements(Pekka Paalanen). > > > --- > > > drivers/gpu/drm/vkms/vkms_composer.c | 321 --- > > > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > > > 2 files changed, 342 insertions(+), 134 deletions(-) > > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > > > b/drivers/gpu/drm/vkms/vkms_composer.c > > > index 383ca657ddf7..69fe3a89bdc9 100644 > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > > > @@ -9,18 +9,26 @@ > > > #include > > > > > > #include "vkms_drv.h" > > > - > > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer, > > > - const struct vkms_composer *composer) > > > -{ > > > - u32 pixel; > > > - int src_offset = composer->offset + (y * composer->pitch) > > > - + (x * composer->cpp); > > > - > > > - pixel = *(u32 *)&buffer[src_offset]; > > > - > > > - return pixel; > > > -} > > > +#include "vkms_formats.h" > > > + > > > +#define get_output_vkms_composer(buffer_pointer, composer) \ > > > + ((struct vkms_composer) { \ > > > + .fb = &(struct drm_framebuffer) { \ > > > + .format = &(struct drm_format_info) { \ > > > + .format = DRM_FORMAT_ARGB16161616, \ > > > + }, \ > > > > Is that really how one can initialize a drm_format_info? Does that > > struct not have a lot more fields? Shouldn't you call a function to > > look up the proper struct with all fields populated? > > I did this macro to just fill the necessary fields, and add more of them > as necessary. > > I was implementing something very similar to the algorithm that > you described below. So this macro
Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen wrote: > > Hi Igor, > > again, that is a really nice speed-up. Unfortunately, I find the code > rather messy and hard to follow. I hope my comments below help with > re-designing it to be easier to understand. > > > On Tue, 26 Oct 2021 08:34:06 -0300 > Igor Torrente wrote: > > > Currently the blend function only accepts XRGB_ and ARGB_ > > as a color input. > > > > This patch refactors all the functions related to the plane composition > > to overcome this limitation. > > > > Now the blend function receives a struct `vkms_pixel_composition_functions` > > containing two handlers. > > > > One will generate a buffer of each line of the frame with the pixels > > converted to ARGB16161616. And the other will take this line buffer, > > do some computation on it, and store the pixels in the destination. > > > > Both the handlers have the same signature. They receive a pointer to > > the pixels that will be processed(`pixels_addr`), the number of pixels > > that will be treated(`length`), and the intermediate buffer of the size > > of a frame line (`line_buffer`). > > > > The first function has been totally described previously. > > What does this sentence mean? In the sentence "One will generate...", I give an overview of the two types of handlers. And the overview of the first handler describes the full behavior of it. But it doesn't look clear enough, I will improve it in the future. > > > > > The second is more interesting, as it has to perform two roles depending > > on where it is called in the code. > > > > The first is to convert(if necessary) the data received in the > > `line_buffer` and write in the memory pointed by `pixels_addr`. > > > > The second role is to perform the `alpha_blend`. So, it takes the pixels > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > > the result back to the `pixels_addr`. > > > > The per-line implementation was chosen for performance reasons. > > The per-pixel functions were having performance issues due to indirect > > function call overhead. > > > > The per-line code trades off memory for execution time. The `line_buffer` > > allows us to diminish the number of function calls. > > > > Results in the IGT test `kms_cursor_crc`: > > > > | Frametime | > > |:---:|:-:|:--:|::| > > | implmentation | Current | Per-pixel | Per-line | > > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > > > Reported-by: kernel test robot > > Signed-off-by: Igor Torrente > > --- > > V2: Improves the performance drastically, by perfoming the operations > > per-line and not per-pixel(Pekka Paalanen). > > Minor improvements(Pekka Paalanen). > > --- > > drivers/gpu/drm/vkms/vkms_composer.c | 321 --- > > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > > 2 files changed, 342 insertions(+), 134 deletions(-) > > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > > b/drivers/gpu/drm/vkms/vkms_composer.c > > index 383ca657ddf7..69fe3a89bdc9 100644 > > --- a/drivers/gpu/drm/vkms/vkms_composer.c > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > > @@ -9,18 +9,26 @@ > > #include > > > > #include "vkms_drv.h" > > - > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer, > > - const struct vkms_composer *composer) > > -{ > > - u32 pixel; > > - int src_offset = composer->offset + (y * composer->pitch) > > - + (x * composer->cpp); > > - > > - pixel = *(u32 *)&buffer[src_offset]; > > - > > - return pixel; > > -} > > +#include "vkms_formats.h" > > + > > +#define get_output_vkms_composer(buffer_pointer, composer) \ > > + ((struct vkms_composer) { \ > > + .fb = &(struct drm_framebuffer) { \ > > + .format = &(struct drm_format_info) { \ > > + .format = DRM_FORMAT_ARGB16161616, \ > > + }, \ > > Is that really how one can initialize a drm_format_info? Does that > struct not have a lot more fields? Shouldn't you call a function to > look up the proper struct with all fields populated? I did this macro to just fill the necessary fields, and add more of them as necessary. I was implementing something very similar to the algorithm that you described below. So this macro will not exist in the next version. > > > + }, \ > > + .map[0].vaddr = (buffer_pointer), \ > > + .src = (composer)->src, \ > > + .dst = (composer)->dst,
Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
Hi Igor, again, that is a really nice speed-up. Unfortunately, I find the code rather messy and hard to follow. I hope my comments below help with re-designing it to be easier to understand. On Tue, 26 Oct 2021 08:34:06 -0300 Igor Torrente wrote: > Currently the blend function only accepts XRGB_ and ARGB_ > as a color input. > > This patch refactors all the functions related to the plane composition > to overcome this limitation. > > Now the blend function receives a struct `vkms_pixel_composition_functions` > containing two handlers. > > One will generate a buffer of each line of the frame with the pixels > converted to ARGB16161616. And the other will take this line buffer, > do some computation on it, and store the pixels in the destination. > > Both the handlers have the same signature. They receive a pointer to > the pixels that will be processed(`pixels_addr`), the number of pixels > that will be treated(`length`), and the intermediate buffer of the size > of a frame line (`line_buffer`). > > The first function has been totally described previously. What does this sentence mean? > > The second is more interesting, as it has to perform two roles depending > on where it is called in the code. > > The first is to convert(if necessary) the data received in the > `line_buffer` and write in the memory pointed by `pixels_addr`. > > The second role is to perform the `alpha_blend`. So, it takes the pixels > in the `line_buffer` and `pixels_addr`, executes the blend, and stores > the result back to the `pixels_addr`. > > The per-line implementation was chosen for performance reasons. > The per-pixel functions were having performance issues due to indirect > function call overhead. > > The per-line code trades off memory for execution time. The `line_buffer` > allows us to diminish the number of function calls. > > Results in the IGT test `kms_cursor_crc`: > > | Frametime | > |:---:|:-:|:--:|::| > | implmentation | Current | Per-pixel | Per-line | > | frametime range | 8~22 ms | 32~56 ms | 6~19 ms | > | Average | 10.0 ms | 35.8 ms | 8.6 ms | > > Reported-by: kernel test robot > Signed-off-by: Igor Torrente > --- > V2: Improves the performance drastically, by perfoming the operations > per-line and not per-pixel(Pekka Paalanen). > Minor improvements(Pekka Paalanen). > --- > drivers/gpu/drm/vkms/vkms_composer.c | 321 --- > drivers/gpu/drm/vkms/vkms_formats.h | 155 + > 2 files changed, 342 insertions(+), 134 deletions(-) > create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c > b/drivers/gpu/drm/vkms/vkms_composer.c > index 383ca657ddf7..69fe3a89bdc9 100644 > --- a/drivers/gpu/drm/vkms/vkms_composer.c > +++ b/drivers/gpu/drm/vkms/vkms_composer.c > @@ -9,18 +9,26 @@ > #include > > #include "vkms_drv.h" > - > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer, > - const struct vkms_composer *composer) > -{ > - u32 pixel; > - int src_offset = composer->offset + (y * composer->pitch) > - + (x * composer->cpp); > - > - pixel = *(u32 *)&buffer[src_offset]; > - > - return pixel; > -} > +#include "vkms_formats.h" > + > +#define get_output_vkms_composer(buffer_pointer, composer) \ > + ((struct vkms_composer) { \ > + .fb = &(struct drm_framebuffer) { \ > + .format = &(struct drm_format_info) { \ > + .format = DRM_FORMAT_ARGB16161616, \ > + }, \ Is that really how one can initialize a drm_format_info? Does that struct not have a lot more fields? Shouldn't you call a function to look up the proper struct with all fields populated? > + }, \ > + .map[0].vaddr = (buffer_pointer), \ > + .src = (composer)->src, \ > + .dst = (composer)->dst, \ > + .cpp = sizeof(u64), \ > + .pitch = drm_rect_width(&(composer)->dst) * sizeof(u64) \ > + }) Why is this a macro rather than a function? > + > +struct vkms_pixel_composition_functions { > + void (*get_src_line)(void *pixels_addr, int length, u64 *line_buffer); > + void (*set_output_line)(void *pixels_addr, int length, u64 > *line_buffer); I would be a little more comfortable if instead of u64 *line_buffer you would have something like struct line_buffer { u16 *row; size_t nelem; } so that the functions to be plugged into these function pointers could assert that you do not