[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-08-12 Thread Philipp Zabel
Am Mittwoch, den 03.08.2016, 17:18 -0700 schrieb Steve Longerbeam:
> On 08/01/2016 02:29 AM, Philipp Zabel wrote:
> > Am Donnerstag, den 28.07.2016, 16:09 -0700 schrieb Steve Longerbeam:
> >>> Now split the frame in half and suddenly pixel x' = 640 is the start of
> >>> a new tile, so it is sampled at x = 160, and pixel x' = 1279 will be
> >>> sampled at x = 160 + (1279 - 640) * 8192/32846. = 319.37, reading over
> >>> the edge of the source image.
> >> Here's where we part.
> >>
> >> The 320x200 --> 1280x800 conversion is split into two 160x200 -->
> >> 640x800 conversions. The DMA controller and ipu_ic_task_init() are given
> >> those width/height dimensions, not the dimensions of the original images.
> >> So this is simply two separate 160x200 --> 640x800 conversions. The only
> >> difference from a true 160x200 --> 640x800 image conversion is that the DMA
> >> controller must be given the stride lengths of the original 320x200 and 
> >> 1280x800
> >> images.
> >>
> >> The rsc for the 160x200 --> 640x800 conversions is
> >>
> >> x = x' * (160-1)/(640-1) = x' * 8192/rsc, so rsc = 32923
> >>
> >>
> >> So original horizontal position 640 is really x' = 0 of the second 
> >> conversion,
> >> which is sampled at x = 0 of the second conversion. And the pixel at x' 
> >> = 1279
> >> is really x' = 639 of the second conversion, which is sampled at x = 639 
> >> * 8192/32923
> >> = 158.98, which does not read over the edge of the source tile.
> > My bad, I somehow thought that the scaling factor is calculated per
> > image (as it IMHO should be), not just per tile.
> >
> > Of course in that case you won't ever read over the edge, but on the
> > other hand the visual problems are worse because you underestimate the
> > scaling factor and introduce a sharp edge at the center: even if the
> > source pixel step per target pixel step is a fraction, between pixels
> > width/2-1 and width/2 there's always a whole source pixel step.
> >
> > Take the extreme example of scaling 32x32 to 1080x1080 pixels. The ideal
> > source pixels for x' = 519 and 520 should be x = 14.911 and 14.939,
> > respectively. Due to tiling they will be x = 15 and 16, introducing a
> > sharp seam in the otherwise blurry mess.
> 
> I think you mean at x' = 539 and x' = 540.
> 
> But yes I agree. Due to tiling, at x' = 539, the input pixel is sampled at x 
> = 15.
> If the interpolation were to contnue (no tiling), at x' = 540, the input pixel
> would be sampled at (31/1079)*540 = 15.514. Instead, because of tiling,
> there is a discontinuity in the interpolation (it is reset), beginning again 
> at
> x' = 0 (540), which is sampled at x = 0 (16).
> 
> The only way I can think of to resolve this problem is to add some width
> to the output tiles such that the interpolation completes a full span between
> input position w - 2 and w - 1. That is, add to w' until floor(F*w') 
> increments
> to the next whole integer, where F = (w-1)/(w'-1) is the scaling factor.
> 
> But that will likely cause the next tile DMA addrs to fail to fall on the 
> IDMAC
> 8 byte alignment.

I always wanted to have a look at the scroll feature, maybe SX can be
used to start at odd pixels?

regards
Philipp



[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-08-03 Thread Steve Longerbeam
On 08/01/2016 02:29 AM, Philipp Zabel wrote:
> Am Donnerstag, den 28.07.2016, 16:09 -0700 schrieb Steve Longerbeam:
>>> Now split the frame in half and suddenly pixel x' = 640 is the start of
>>> a new tile, so it is sampled at x = 160, and pixel x' = 1279 will be
>>> sampled at x = 160 + (1279 - 640) * 8192/32846. = 319.37, reading over
>>> the edge of the source image.
>> Here's where we part.
>>
>> The 320x200 --> 1280x800 conversion is split into two 160x200 -->
>> 640x800 conversions. The DMA controller and ipu_ic_task_init() are given
>> those width/height dimensions, not the dimensions of the original images.
>> So this is simply two separate 160x200 --> 640x800 conversions. The only
>> difference from a true 160x200 --> 640x800 image conversion is that the DMA
>> controller must be given the stride lengths of the original 320x200 and 
>> 1280x800
>> images.
>>
>> The rsc for the 160x200 --> 640x800 conversions is
>>
>> x = x' * (160-1)/(640-1) = x' * 8192/rsc, so rsc = 32923
>>
>>
>> So original horizontal position 640 is really x' = 0 of the second 
>> conversion,
>> which is sampled at x = 0 of the second conversion. And the pixel at x' 
>> = 1279
>> is really x' = 639 of the second conversion, which is sampled at x = 639 
>> * 8192/32923
>> = 158.98, which does not read over the edge of the source tile.
> My bad, I somehow thought that the scaling factor is calculated per
> image (as it IMHO should be), not just per tile.
>
> Of course in that case you won't ever read over the edge, but on the
> other hand the visual problems are worse because you underestimate the
> scaling factor and introduce a sharp edge at the center: even if the
> source pixel step per target pixel step is a fraction, between pixels
> width/2-1 and width/2 there's always a whole source pixel step.
>
> Take the extreme example of scaling 32x32 to 1080x1080 pixels. The ideal
> source pixels for x' = 519 and 520 should be x = 14.911 and 14.939,
> respectively. Due to tiling they will be x = 15 and 16, introducing a
> sharp seam in the otherwise blurry mess.

I think you mean at x' = 539 and x' = 540.

But yes I agree. Due to tiling, at x' = 539, the input pixel is sampled at x = 
15.
If the interpolation were to continue (no tiling), at x' = 540, the input pixel
would be sampled at (31/1079)*540 = 15.514. Instead, because of tiling,
there is a discontinuity in the interpolation (it is reset), beginning again at
x' = 0 (540), which is sampled at x = 0 (16).

The only way I can think of to resolve this problem is to add some width
to the output tiles such that the interpolation completes a full span between
input position w - 2 and w - 1. That is, add to w' until floor(F*w') increments
to the next whole integer, where F = (w-1)/(w'-1) is the scaling factor.

But that will likely cause the next tile DMA addrs to fail to fall on the IDMAC
8 byte alignment.


>
>
>> That said, I _have_ noticed seams, but I have always attributed them to the
>> fact that we have a discontinuity in color-space conversion and/or resize
>> interpolation at the boundary between tiles.
>>
>> I've also found that the seams are quite noticeable when rendered to a
>> display overlay, but become significantly less pronounced if the images are
>> converted to a back buffer, and then page-flipped to front buffer when the
>> conversion (all tiles) completes.
> I don't know what to make of this. Maybe it is a timing issue and what
> you are actually seeing is tearing between tiles of different frames?

Yes, that's always been my assumption, a scan-out contains a mix
of tiles from different frames, when page-flip is not used.

Steve



[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-08-01 Thread Philipp Zabel
Am Donnerstag, den 28.07.2016, 16:09 -0700 schrieb Steve Longerbeam:
> > Now split the frame in half and suddenly pixel x' = 640 is the start of
> > a new tile, so it is sampled at x = 160, and pixel x' = 1279 will be
> > sampled at x = 160 + (1279 - 640) * 8192/32846. = 319.37, reading over
> > the edge of the source image.
> 
> Here's where we part.
> 
> The 320x200 --> 1280x800 conversion is split into two 160x200 -->
> 640x800 conversions. The DMA controller and ipu_ic_task_init() are given
> those width/height dimensions, not the dimensions of the original images.
> So this is simply two separate 160x200 --> 640x800 conversions. The only
> difference from a true 160x200 --> 640x800 image conversion is that the DMA
> controller must be given the stride lengths of the original 320x200 and 
> 1280x800
> images.
> 
> The rsc for the 160x200 --> 640x800 conversions is
> 
> x = x' * (160-1)/(640-1) = x' * 8192/rsc, so rsc = 32923
> 
> 
> So original horizontal position 640 is really x' = 0 of the second 
> conversion,
> which is sampled at x = 0 of the second conversion. And the pixel at x' 
> = 1279
> is really x' = 639 of the second conversion, which is sampled at x = 639 
> * 8192/32923
> = 158.98, which does not read over the edge of the source tile.

My bad, I somehow thought that the scaling factor is calculated per
image (as it IMHO should be), not just per tile.

Of course in that case you won't ever read over the edge, but on the
other hand the visual problems are worse because you underestimate the
scaling factor and introduce a sharp edge at the center: even if the
source pixel step per target pixel step is a fraction, between pixels
width/2-1 and width/2 there's always a whole source pixel step.

Take the extreme example of scaling 32x32 to 1080x1080 pixels. The ideal
source pixels for x' = 519 and 520 should be x = 14.911 and 14.939,
respectively. Due to tiling they will be x = 15 and 16, introducing a
sharp seam in the otherwise blurry mess.

> > This problem gets worse if you start using arbitrary frame sizes and YUV
> > planar images and consider that tile start addresses are (currently)
> > limited to 8 byte boundaries, to the point that there are very visible
> > seams in the center of the image, depending on scaling factor and image
> > sizes.
> 
> Indeed there could be other parameters that would cause the resizer to
> read past the edge of the source tiles, I will need to try and find such 
> cases. 
> But not in the above case.

Ok.

> That said, I _have_ noticed seams, but I have always attributed them to the
> fact that we have a discontinuity in color-space conversion and/or resize
> interpolation at the boundary between tiles.
>
> I've also found that the seams are quite noticeable when rendered to a
> display overlay, but become significantly less pronounced if the images are
> converted to a back buffer, and then page-flipped to front buffer when the
> conversion (all tiles) completes.

I don't know what to make of this. Maybe it is a timing issue and what
you are actually seeing is tearing between tiles of different frames?

regards
Philipp



[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-07-28 Thread Steve Longerbeam
Hi Philipp,

On 07/26/2016 03:08 AM, Philipp Zabel wrote:
>
>>   
>> +/*
>> + * The IC Resizer has a restriction that the output frame from the
>> + * resizer must be 1024 or less in both width (pixels) and height
>> + * (lines).
>> + *
>> + * The image conversion support attempts to split up a conversion when
>> + * the desired output (converted) frame resolution exceeds the IC resizer
>> + * limit of 1024 in either dimension.
>> + *
>> + * If either dimension of the output frame exceeds the limit, the
>> + * dimension is split into 1, 2, or 4 equal stripes,
> Imagine converting a 320x200 image up to 1280x800, and consider only the
> x coordinate. The IC upscaler is a simple bilinear scaler.

Right, the upscaling is a simple linear interpolation between two
adjacent input pixels, just paraphrasing you.

> We want target pixel x' = 1279 to sample the source pixel x = 319, so
> the scaling factor rsc is calculated to:
>
> x = x' * (320-1)/(1280-1) = x' * 8192/rsc, with rsc = 32846
>
> That means that the target pixels x' = 639 and x' = 640 should be
> sampled (bilinearly) from x = 639 * 8192/32846. = 159.37 and x = 640 *
> 8192/32846. = 159.62, respectively.

I'm with you so far.

>
> Now split the frame in half and suddenly pixel x' = 640 is the start of
> a new tile, so it is sampled at x = 160, and pixel x' = 1279 will be
> sampled at x = 160 + (1279 - 640) * 8192/32846. = 319.37, reading over
> the edge of the source image.

Here's where we part.

The 320x200 --> 1280x800 conversion is split into two 160x200 -->
640x800 conversions. The DMA controller and ipu_ic_task_init() are given
those width/height dimensions, not the dimensions of the original images.
So this is simply two separate 160x200 --> 640x800 conversions. The only
difference from a true 160x200 --> 640x800 image conversion is that the DMA
controller must be given the stride lengths of the original 320x200 and 
1280x800
images.

The rsc for the 160x200 --> 640x800 conversions is

x = x' * (160-1)/(640-1) = x' * 8192/rsc, so rsc = 32923


So original horizontal position 640 is really x' = 0 of the second 
conversion,
which is sampled at x = 0 of the second conversion. And the pixel at x' 
= 1279
is really x' = 639 of the second conversion, which is sampled at x = 639 
* 8192/32923
= 158.98, which does not read over the edge of the source tile.


> This problem gets worse if you start using arbitrary frame sizes and YUV
> planar images and consider that tile start addresses are (currently)
> limited to 8 byte boundaries, to the point that there are very visible
> seams in the center of the image, depending on scaling factor and image
> sizes.

Indeed there could be other parameters that would cause the resizer to
read past the edge of the source tiles, I will need to try and find such 
cases.
But not in the above case.

That said, I _have_ noticed seams, but I have always attributed them to the
fact that we have a discontinuity in color-space conversion and/or resize
interpolation at the boundary between tiles.

I've also found that the seams are quite noticeable when rendered to a
display overlay, but become significantly less pronounced if the images are
converted to a back buffer, and then page-flipped to front buffer when the
conversion (all tiles) completes.

Steve

> I wonder how much effort it would be to remove the tiling code for now
> and add it back in a second step once it is fixed? Otherwise we could
> just disallow scaled tiling for now, but I'd like this to be prepared
> for tiles with different sizes at least, before merging.
>
> regards
> Philipp
>



[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-07-26 Thread Philipp Zabel
Am Dienstag, den 19.07.2016, 18:11 -0700 schrieb Steve Longerbeam:
> This patch implements complete image conversion support to ipu-ic,
> with tiling to support scaling to and from images up to 4096x4096.
> Image rotation is also supported.
> 
> The internal API is subsystem agnostic (no V4L2 dependency except
> for the use of V4L2 fourcc pixel formats).
> 
> Callers prepare for image conversion by calling
> ipu_image_convert_prepare(), which initializes the parameters of
> the conversion. The caller passes in the ipu_ic task to use for
> the conversion, the input and output image formats, a rotation mode,
> and a completion callback and completion context pointer:
> 
> struct image_converter_ctx *
> ipu_image_convert_prepare(struct ipu_ic *ic,
>   struct ipu_image *in, struct ipu_image *out,
>   enum ipu_rotate_mode rot_mode,
>   image_converter_cb_t complete,
>   void *complete_context);
> 
> The caller is given a new conversion context that must be passed to
> the further APIs:
> 
> struct image_converter_run *
> ipu_image_convert_run(struct image_converter_ctx *ctx,
>   dma_addr_t in_phys, dma_addr_t out_phys);
> 
> This queues a new image conversion request to a run queue, and
> starts the conversion immediately if the run queue is empty. Only
> the physaddr's of the input and output image buffers are needed,
> since the conversion context was created previously with
> ipu_image_convert_prepare(). Returns a new run object pointer. When
> the conversion completes, the run pointer is returned to the
> completion callback.
> 
> void image_convert_abort(struct image_converter_ctx *ctx);
> 
> This will abort any active or pending conversions for this context.
> Any currently active or pending runs belonging to this context are
> returned via the completion callback with an error status.
> 
> void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);
> 
> Unprepares the conversion context. Any active or pending runs will
> be aborted by calling image_convert_abort().
> 
> Signed-off-by: Steve Longerbeam 
> ---
>  drivers/gpu/ipu-v3/ipu-ic.c | 1691 
> ++-
>  include/video/imx-ipu-v3.h  |   57 +-
>  2 files changed, 1736 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
> index 1a37afc..5471b72 100644
> --- a/drivers/gpu/ipu-v3/ipu-ic.c
> +++ b/drivers/gpu/ipu-v3/ipu-ic.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "ipu-prv.h"
>  
>  /* IC Register Offsets */
> @@ -82,6 +84,40 @@
>  #define IC_IDMAC_3_PP_WIDTH_MASK(0x3ff << 20)
>  #define IC_IDMAC_3_PP_WIDTH_OFFSET  20
>  
> +/*
> + * The IC Resizer has a restriction that the output frame from the
> + * resizer must be 1024 or less in both width (pixels) and height
> + * (lines).
> + *
> + * The image conversion support attempts to split up a conversion when
> + * the desired output (converted) frame resolution exceeds the IC resizer
> + * limit of 1024 in either dimension.
> + *
> + * If either dimension of the output frame exceeds the limit, the
> + * dimension is split into 1, 2, or 4 equal stripes, 

Imagine converting a 320x200 image up to 1280x800, and consider only the
x coordinate. The IC upscaler is a simple bilinear scaler.

We want target pixel x' = 1279 to sample the source pixel x = 319, so
the scaling factor rsc is calculated to:

x = x' * (320-1)/(1280-1) = x' * 8192/rsc, with rsc = 32846

That means that the target pixels x' = 639 and x' = 640 should be
sampled (bilinearly) from x = 639 * 8192/32846. = 159.37 and x = 640 *
8192/32846. = 159.62, respectively.

Now split the frame in half and suddenly pixel x' = 640 is the start of
a new tile, so it is sampled at x = 160, and pixel x' = 1279 will be
sampled at x = 160 + (1279 - 640) * 8192/32846. = 319.37, reading over
the edge of the source image.
This problem gets worse if you start using arbitrary frame sizes and YUV
planar images and consider that tile start addresses are (currently)
limited to 8 byte boundaries, to the point that there are very visible
seams in the center of the image, depending on scaling factor and image
sizes.

I wonder how much effort it would be to remove the tiling code for now
and add it back in a second step once it is fixed? Otherwise we could
just disallow scaled tiling for now, but I'd like this to be prepared
for tiles with different sizes at least, before merging.

regards
Philipp



[PATCH v2 11/13] gpu: ipu-ic: Add complete image conversion support with tiling

2016-07-19 Thread Steve Longerbeam
This patch implements complete image conversion support to ipu-ic,
with tiling to support scaling to and from images up to 4096x4096.
Image rotation is also supported.

The internal API is subsystem agnostic (no V4L2 dependency except
for the use of V4L2 fourcc pixel formats).

Callers prepare for image conversion by calling
ipu_image_convert_prepare(), which initializes the parameters of
the conversion. The caller passes in the ipu_ic task to use for
the conversion, the input and output image formats, a rotation mode,
and a completion callback and completion context pointer:

struct image_converter_ctx *
ipu_image_convert_prepare(struct ipu_ic *ic,
  struct ipu_image *in, struct ipu_image *out,
  enum ipu_rotate_mode rot_mode,
  image_converter_cb_t complete,
  void *complete_context);

The caller is given a new conversion context that must be passed to
the further APIs:

struct image_converter_run *
ipu_image_convert_run(struct image_converter_ctx *ctx,
  dma_addr_t in_phys, dma_addr_t out_phys);

This queues a new image conversion request to a run queue, and
starts the conversion immediately if the run queue is empty. Only
the physaddr's of the input and output image buffers are needed,
since the conversion context was created previously with
ipu_image_convert_prepare(). Returns a new run object pointer. When
the conversion completes, the run pointer is returned to the
completion callback.

void image_convert_abort(struct image_converter_ctx *ctx);

This will abort any active or pending conversions for this context.
Any currently active or pending runs belonging to this context are
returned via the completion callback with an error status.

void ipu_image_convert_unprepare(struct image_converter_ctx *ctx);

Unprepares the conversion context. Any active or pending runs will
be aborted by calling image_convert_abort().

Signed-off-by: Steve Longerbeam 
---
 drivers/gpu/ipu-v3/ipu-ic.c | 1691 ++-
 include/video/imx-ipu-v3.h  |   57 +-
 2 files changed, 1736 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/ipu-v3/ipu-ic.c b/drivers/gpu/ipu-v3/ipu-ic.c
index 1a37afc..5471b72 100644
--- a/drivers/gpu/ipu-v3/ipu-ic.c
+++ b/drivers/gpu/ipu-v3/ipu-ic.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "ipu-prv.h"

 /* IC Register Offsets */
@@ -82,6 +84,40 @@
 #define IC_IDMAC_3_PP_WIDTH_MASK(0x3ff << 20)
 #define IC_IDMAC_3_PP_WIDTH_OFFSET  20

+/*
+ * The IC Resizer has a restriction that the output frame from the
+ * resizer must be 1024 or less in both width (pixels) and height
+ * (lines).
+ *
+ * The image conversion support attempts to split up a conversion when
+ * the desired output (converted) frame resolution exceeds the IC resizer
+ * limit of 1024 in either dimension.
+ *
+ * If either dimension of the output frame exceeds the limit, the
+ * dimension is split into 1, 2, or 4 equal stripes, for a maximum
+ * of 4*4 or 16 tiles. A conversion is then carried out for each
+ * tile (but taking care to pass the full frame stride length to
+ * the DMA channel's parameter memory!). IDMA double-buffering is used
+ * to convert each tile back-to-back when possible (see note below
+ * when double_buffering boolean is set).
+ *
+ * Note that the input frame must be split up into the same number
+ * of tiles as the output frame.
+ */
+#define MAX_STRIPES_W4
+#define MAX_STRIPES_H4
+#define MAX_TILES (MAX_STRIPES_W * MAX_STRIPES_H)
+
+#define MIN_W 128
+#define MIN_H 128
+#define MAX_W 4096
+#define MAX_H 4096
+
+enum image_convert_type {
+   IMAGE_CONVERT_IN = 0,
+   IMAGE_CONVERT_OUT,
+};
+
 struct ic_task_regoffs {
u32 rsc;
u32 tpmem_csc[2];
@@ -96,6 +132,16 @@ struct ic_task_bitfields {
u32 ic_cmb_galpha_bit;
 };

+struct ic_task_channels {
+   int in;
+   int out;
+   int rot_in;
+   int rot_out;
+   int vdi_in_p;
+   int vdi_in;
+   int vdi_in_n;
+};
+
 static const struct ic_task_regoffs ic_task_reg[IC_NUM_TASKS] = {
[IC_TASK_ENCODER] = {
.rsc = IC_PRP_ENC_RSC,
@@ -138,12 +184,159 @@ static const struct ic_task_bitfields 
ic_task_bit[IC_NUM_TASKS] = {
},
 };

+static const struct ic_task_channels ic_task_ch[IC_NUM_TASKS] = {
+   [IC_TASK_ENCODER] = {
+   .out = IPUV3_CHANNEL_IC_PRP_ENC_MEM,
+   .rot_in = IPUV3_CHANNEL_MEM_ROT_ENC,
+   .rot_out = IPUV3_CHANNEL_ROT_ENC_MEM,
+   },
+   [IC_TASK_VIEWFINDER] = {
+   .in = IPUV3_CHANNEL_MEM_IC_PRP_VF,
+   .out = IPUV3_CHANNEL_IC_PRP_VF_MEM,
+   .rot_in = IPUV3_CHANNEL_MEM_ROT_VF,
+   .rot_out = IPUV3_CHANNEL_ROT_VF_MEM,
+   .vdi_in_p = IPUV3_CHANNEL_MEM_VDI_PREV,
+   .vdi_in = IPUV3_CHANNEL_MEM_VDI_CUR,
+   .vdi_