Re: [PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-23 Thread Philipp Zabel
Hi Steve,

On Sun, 2018-07-22 at 11:30 -0700, Steve Longerbeam wrote:
[...]
> To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
> mediatree fork on github. I moved the mem2mem driver to the beginning
> and added a few patches:
> 
> d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
> b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
> format")
> 4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
> d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
> adjust")
> 
> (feel free to squash some of those if you agree with them for v3).

Thank you, I've squashed them where it made sense:

- "media: imx: mem2mem: Use ipu_image_convert_adjust in try format"
  into "media: imx: add mem2mem device" so it could be merged
  independently,
- "gpu: ipu-v3: image-convert: Fix width/height alignment" into
  "gpu: ipu-v3: image-convert: relax alignment restrictions", which
  itself is squashed together from "gpu: ipu-v3: image-convert: relax
  input alignment restrictions" and "gpu: ipu-v3: image-convert: relax
  output alignment restrictions", and
- "gpu: ipu-v3: image-convert: Fix input bytesperline clamp in adjust"
  into "gpu: ipu-v3: image-convert: fix bytesperline adjustment".

I've added some fixes and limited output tile top/left alignment to 8x8
IRT block size if the rotator is being used, and dropped the current
state into this branch:

  git://git.pengutronix.de/pza/linux imx-mem2mem

regards
Philipp


Re: [PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-23 Thread Philipp Zabel
On Mon, 2018-07-23 at 11:29 +0200, Philipp Zabel wrote:
[...]
> > Also, I'm trying to parse the functions find_best_seam() and 
> > find_seams(). Can
> > you provide some more background on the behavior of those functions?
> 
> The hardware limits us to restart linear sampling at zero with each
> tile, so find_seams() tries to find the (horizontal and vertical) output
> positions where the corresponding input sampling positions are closest
> to integer values.
> The distance between the ideal fractional input sampling position and
> the actual integer sampling position that can be achieved is the amount
> of distortion we have to introduce (by slightly stretching one input
> tile and slightly shrinking the other) to completely hide the visible
> seams.

Actually, this is not all of it. In addition to being an integer, the
input sampling position at seam start is still subject to alignment
restrictions, so the actual value that is minimized is the difference
between the ideal fractional input sampling position and the closest
aligned input position.

regards
Philipp


Re: [PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-23 Thread Philipp Zabel
Hi Steve,

On Sun, 2018-07-22 at 11:30 -0700, Steve Longerbeam wrote:
> Hi Philipp,
> 
> On 07/19/2018 08:30 AM, Philipp Zabel wrote:
> > Hi,
> > 
> > this is the second version of the i.MX mem2mem scaler series.
> > Patches 8 and 16 have been modified.
> > 
> > Changes since v1:
> >   - Fix inverted allow_overshoot logic
> >   - Correctly switch horizontal / vertical tile alignment when
> > determining seam positions with the 90° rotator active.
> 
> Yes, this fixes the specific rotation test that was broken
> (720x480, UYVY --> 1280x768, UYVY, rotate 90).
> 
> But running more tests on this v2 reveals more issues. I chose a
> somewhat random upscaling-only example as a first try:
> 
> 640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).
>
> This produces division by zero backtraces and the conversion hangs:
> 
> 
> [  131.079978] Division by zero in kernel.
[...]

Thanks, find_best_seam() breaks because it is fed the wrong bottom edge:

--8<--
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c 
b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 726e3b7390c7..0c47d39adf03 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -806,7 +801,7 @@ static void find_seams(struct ipu_image_convert_ctx *ctx,
/* Start within 1024 lines of the bottom edge */
out_start = max_t(int, 0, out_bottom - 1024);
/* End before having to add more rows above */
-   out_end = min_t(unsigned int, out_right, row * 1024);
+   out_end = min_t(unsigned int, out_bottom, row * 1024);
 
find_best_seam(ctx, out_start, out_end,
   in_top_align, out_top_align, out_height_align,
-->8--

Also we unnecessarily use four tile columns instead of three:

--8<--
diff --git a/drivers/gpu/ipu-v3/ipu-image-convert.c 
b/drivers/gpu/ipu-v3/ipu-image-convert.c
index 726e3b7390c7..0c47d39adf03 100644
--- a/drivers/gpu/ipu-v3/ipu-image-convert.c
+++ b/drivers/gpu/ipu-v3/ipu-image-convert.c
@@ -380,12 +380,7 @@ static int alloc_dma_buf(struct ipu_image_convert_priv 
*priv,
 
 static inline int num_stripes(int dim)
 {
-   if (dim <= 1024)
-   return 1;
-   else if (dim <= 2048)
-   return 2;
-   else
-   return 4;
+   return (dim - 1) / 1024 + 1;
 }
 
 /*
-->8--

With that fixed, your test case succeeds.

Unfortunately, just adding rotate=90 makes it hang again. I'll
investigate.

> To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
> mediatree fork on github. I moved the mem2mem driver to the beginning
> and added a few patches:
> 
> d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
> b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
> format")
> 4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
> d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
> adjust")
> 
> (feel free to squash some of those if you agree with them for v3).
>
> By moving the mem2mem driver before the seam avoidance patches, and making
> it independent of the image converter implementation, the driver can be 
> tested with
> and without the seam avoidance changes.

Yes, this makes sense to me. If we merge the mem2mem driver before the
image-convert changes go in, it should be limited to 1024x1024 output,
but if we manage to merge both parts in the same cycle, this should be
fine.

[...]
> Also, I'm trying to parse the functions find_best_seam() and 
> find_seams(). Can
> you provide some more background on the behavior of those functions?

The hardware limits us to restart linear sampling at zero with each
tile, so find_seams() tries to find the (horizontal and vertical) output
positions where the corresponding input sampling positions are closest
to integer values.
The distance between the ideal fractional input sampling position and
the actual integer sampling position that can be achieved is the amount
of distortion we have to introduce (by slightly stretching one input
tile and slightly shrinking the other) to completely hide the visible
seams.

find_best_seam() contains the code to find the left (or top) edge for a
single column (or row) that minimizes this distortion, given the right
(or bottom) edge, scaling factor, alignment restrictions, and allowed
range. The range is limited by the maximum tile width (or height).

find_seams() first iterates over all columns, right to left, and calls
find_best_seam() for each column. Each found seam then serves as the
right edge of the next column. Then it iterates over all rows, bottom to
top, again calling find_best_seam() for each row.

The reason we start at the bottom/right edges is that we have to make
sure that burst size / rotator block size align with the bottom/right
edge of the output frame.

regards
Philipp


Re: [PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-22 Thread Steve Longerbeam




On 07/22/2018 11:30 AM, Steve Longerbeam wrote:

Hi Philipp,


On 07/19/2018 08:30 AM, Philipp Zabel wrote:

Hi,

this is the second version of the i.MX mem2mem scaler series.
Patches 8 and 16 have been modified.

Changes since v1:
  - Fix inverted allow_overshoot logic
  - Correctly switch horizontal / vertical tile alignment when
    determining seam positions with the 90° rotator active.


Yes, this fixes the specific rotation test that was broken
(720x480, UYVY --> 1280x768, UYVY, rotate 90).

But running more tests on this v2 reveals more issues. I chose a
somewhat random upscaling-only example as a first try:

640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).

This produces division by zero backtraces and the conversion hangs:



The hang is apparently because the conversion is re-attempted over and
over again, with an endless WARN() from
drivers/media/common/videobuf2/videobuf2-core.c:900.

I fixed the hang with an additional patch:

50026cbe08 ("media: imx: mem2mem: Remove buffers on device_run failures")

With this the conversion completes, but the below div-by-zero errors
persist, and the resultant image is blank.

Steve



[  131.079978] Division by zero in kernel.
[  131.083853] CPU: 0 PID: 683 Comm: mx6-m2m Tainted: G W 
4.18.0-rc2-13448-g678218d #7

[  131.092830] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  131.099372] Backtrace:
[  131.101858] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)

[  131.109450]  r7: r6:600f0013 r5: r4:c107db3c
[  131.115135] [] (show_stack) from [] 
(dump_stack+0xb4/0xe8)
[  131.122380] [] (dump_stack) from [] 
(__div0+0x18/0x20)
[  131.129274]  r9:ec37d800 r8:0003 r7: r6: 
r5: r4:ec37dae8

[  131.137036] [] (__div0) from [] (Ldiv0+0x8/0x10)
[  131.143425] [] (ipu_image_convert_prepare) from 
[] (mem2mem_start_streaming+0xe0/0x1c0)
[  131.153186]  r10:c0b9f640 r9:c071958c r8:0280 r7:01e0 
r6:32315659 r5:c1008908

[  131.161030]  r4:ecf9c800
[  131.163588] [] (mem2mem_start_streaming) from 
[] (vb2_start_streaming+0x64/0x160)
[  131.172826]  r8:c1008908 r7:0001 r6:ed01b808 r5:ed01b934 
r4:ed01b810
[  131.179547] [] (vb2_start_streaming) from [] 
(vb2_core_streamon+0x10c/0x164)
[  131.188351]  r9:c071958c r8:c1008908 r7:0001 r6:ec1038f8 
r5: r4:ed01b808
[  131.196114] [] (vb2_core_streamon) from [] 
(vb2_streamon+0x34/0x58)

[  131.204133]  r5:40045612 r4:ed01b800
[  131.207733] [] (vb2_streamon) from [] 
(v4l2_m2m_streamon+0x24/0x3c)
[  131.215758] [] (v4l2_m2m_streamon) from [] 
(v4l2_m2m_ioctl_streamon+0x18/0x1c)

[  131.224732]  r5:40045612 r4:c072f354
[  131.228330] [] (v4l2_m2m_ioctl_streamon) from 
[] (v4l_streamon+0x24/0x28)
[  131.236878] [] (v4l_streamon) from [] 
(__video_do_ioctl+0x284/0x4f8)

[  131.244984]  r5:40045612 r4:ecc50800
[  131.248583] [] (__video_do_ioctl) from [] 
(video_usercopy+0x260/0x55c)
[  131.256866]  r10:0004 r9: r8:c1008908 r7:ed747dfc 
r6: r5:0004

[  131.264709]  r4:40045612
[  131.267265] [] (video_usercopy) from [] 
(video_ioctl2+0x14/0x1c)
[  131.275026]  r10:0036 r9:0003 r8:ed480068 r7:c0254840 
r6:ecc76000 r5:bea6ab38

[  131.282869]  r4:c071fcd8
[  131.285424] [] (video_ioctl2) from [] 
(v4l2_ioctl+0x44/0x5c)
[  131.292845] [] (v4l2_ioctl) from [] 
(do_vfs_ioctl+0xa8/0xa4c)

[  131.300343]  r5:bea6ab38 r4:c1008908
[  131.303940] [] (do_vfs_ioctl) from [] 
(ksys_ioctl+0x3c/0x60)
[  131.311355]  r10:0036 r9:ed746000 r8:bea6ab38 r7:40045612 
r6:0003 r5:ecc76000

[  131.319198]  r4:ecc76000
[  131.321752] [] (ksys_ioctl) from [] 
(sys_ioctl+0x10/0x14)
[  131.328907]  r9:ed746000 r8:c01011e4 r7:0036 r6:00010960 
r5: r4:00012620
[  131.336672] [] (sys_ioctl) from [] 
(ret_fast_syscall+0x0/0x28)

[  131.344256] Exception stack(0xed747fa8 to 0xed747ff0)
[  131.349327] 7fa0:   00012620  0003 
40045612 bea6ab38 0003
[  131.357524] 7fc0: 00012620  00010960 0036  
 45d8 bea6abac

[  131.365717] 7fe0: 0002312c bea6aaa4 00012308 45e58d5c


To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
mediatree fork on github. I moved the mem2mem driver to the beginning
and added a few patches:

d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma 
buffers")
b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
format")

4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp 
in adjust")


(feel free to squash some of those if you agree with them for v3).

By moving the mem2mem driver before the seam avoidance patches, and 
making
it independent of the image converter implementation, the driver can 
be tested with

and without the seam avoidance changes.

If you run a git rebase and build/run the kernel when stopped at 
b4362162c0 (e.g.
without the seam avoidance patches), you will find that the 

Re: [PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-22 Thread Steve Longerbeam

Hi Philipp,


On 07/19/2018 08:30 AM, Philipp Zabel wrote:

Hi,

this is the second version of the i.MX mem2mem scaler series.
Patches 8 and 16 have been modified.

Changes since v1:
  - Fix inverted allow_overshoot logic
  - Correctly switch horizontal / vertical tile alignment when
determining seam positions with the 90° rotator active.


Yes, this fixes the specific rotation test that was broken
(720x480, UYVY --> 1280x768, UYVY, rotate 90).

But running more tests on this v2 reveals more issues. I chose a
somewhat random upscaling-only example as a first try:

640x480, YV12 --> full HD 2560x1600, YV12 (no rotation or flip).

This produces division by zero backtraces and the conversion hangs:


[  131.079978] Division by zero in kernel.
[  131.083853] CPU: 0 PID: 683 Comm: mx6-m2m Tainted: G W 
4.18.0-rc2-13448-g678218d #7

[  131.092830] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  131.099372] Backtrace:
[  131.101858] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)

[  131.109450]  r7: r6:600f0013 r5: r4:c107db3c
[  131.115135] [] (show_stack) from [] 
(dump_stack+0xb4/0xe8)
[  131.122380] [] (dump_stack) from [] 
(__div0+0x18/0x20)
[  131.129274]  r9:ec37d800 r8:0003 r7: r6: 
r5: r4:ec37dae8

[  131.137036] [] (__div0) from [] (Ldiv0+0x8/0x10)
[  131.143425] [] (ipu_image_convert_prepare) from 
[] (mem2mem_start_streaming+0xe0/0x1c0)
[  131.153186]  r10:c0b9f640 r9:c071958c r8:0280 r7:01e0 
r6:32315659 r5:c1008908

[  131.161030]  r4:ecf9c800
[  131.163588] [] (mem2mem_start_streaming) from [] 
(vb2_start_streaming+0x64/0x160)

[  131.172826]  r8:c1008908 r7:0001 r6:ed01b808 r5:ed01b934 r4:ed01b810
[  131.179547] [] (vb2_start_streaming) from [] 
(vb2_core_streamon+0x10c/0x164)
[  131.188351]  r9:c071958c r8:c1008908 r7:0001 r6:ec1038f8 
r5: r4:ed01b808
[  131.196114] [] (vb2_core_streamon) from [] 
(vb2_streamon+0x34/0x58)

[  131.204133]  r5:40045612 r4:ed01b800
[  131.207733] [] (vb2_streamon) from [] 
(v4l2_m2m_streamon+0x24/0x3c)
[  131.215758] [] (v4l2_m2m_streamon) from [] 
(v4l2_m2m_ioctl_streamon+0x18/0x1c)

[  131.224732]  r5:40045612 r4:c072f354
[  131.228330] [] (v4l2_m2m_ioctl_streamon) from [] 
(v4l_streamon+0x24/0x28)
[  131.236878] [] (v4l_streamon) from [] 
(__video_do_ioctl+0x284/0x4f8)

[  131.244984]  r5:40045612 r4:ecc50800
[  131.248583] [] (__video_do_ioctl) from [] 
(video_usercopy+0x260/0x55c)
[  131.256866]  r10:0004 r9: r8:c1008908 r7:ed747dfc 
r6: r5:0004

[  131.264709]  r4:40045612
[  131.267265] [] (video_usercopy) from [] 
(video_ioctl2+0x14/0x1c)
[  131.275026]  r10:0036 r9:0003 r8:ed480068 r7:c0254840 
r6:ecc76000 r5:bea6ab38

[  131.282869]  r4:c071fcd8
[  131.285424] [] (video_ioctl2) from [] 
(v4l2_ioctl+0x44/0x5c)
[  131.292845] [] (v4l2_ioctl) from [] 
(do_vfs_ioctl+0xa8/0xa4c)

[  131.300343]  r5:bea6ab38 r4:c1008908
[  131.303940] [] (do_vfs_ioctl) from [] 
(ksys_ioctl+0x3c/0x60)
[  131.311355]  r10:0036 r9:ed746000 r8:bea6ab38 r7:40045612 
r6:0003 r5:ecc76000

[  131.319198]  r4:ecc76000
[  131.321752] [] (ksys_ioctl) from [] 
(sys_ioctl+0x10/0x14)
[  131.328907]  r9:ed746000 r8:c01011e4 r7:0036 r6:00010960 
r5: r4:00012620
[  131.336672] [] (sys_ioctl) from [] 
(ret_fast_syscall+0x0/0x28)

[  131.344256] Exception stack(0xed747fa8 to 0xed747ff0)
[  131.349327] 7fa0:   00012620  0003 
40045612 bea6ab38 0003
[  131.357524] 7fc0: 00012620  00010960 0036  
 45d8 bea6abac

[  131.365717] 7fe0: 0002312c bea6aaa4 00012308 45e58d5c


To aid in debugging this I created branch 'imx-mem2mem.stevel' in my
mediatree fork on github. I moved the mem2mem driver to the beginning
and added a few patches:

d317a7771c ("gpu: ipu-cpmem: add WARN_ON_ONCE() for unaligned dma buffers")
b4362162c0 ("media: imx: mem2mem: Use ipu_image_convert_adjust in try 
format")

4758be0cf8 ("gpu: ipu-v3: image-convert: Fix width/height alignment")
d069163c7f ("gpu: ipu-v3: image-convert: Fix input bytesperline clamp in 
adjust")


(feel free to squash some of those if you agree with them for v3).

By moving the mem2mem driver before the seam avoidance patches, and making
it independent of the image converter implementation, the driver can be 
tested with

and without the seam avoidance changes.

If you run a git rebase and build/run the kernel when stopped at 
b4362162c0 (e.g.
without the seam avoidance patches), you will find that the above 
640x480 -->

2560x1600 conversion succeeds, albeit with the expected visible seams at the
tile boundaries.

Also, I'm trying to parse the functions find_best_seam() and 
find_seams(). Can

you provide some more background on the behavior of those functions?

Steve


  - Fix SPDX-License-Identifier and remove superfluous license
text.
  - Fix uninitialized walign in try_fmt

Previous cover letter:

we have image conversion code for scaling 

[PATCH v2 00/16] i.MX media mem2mem scaler

2018-07-19 Thread Philipp Zabel
Hi,

this is the second version of the i.MX mem2mem scaler series.
Patches 8 and 16 have been modified.

Changes since v1:
 - Fix inverted allow_overshoot logic
 - Correctly switch horizontal / vertical tile alignment when
   determining seam positions with the 90° rotator active.
 - Fix SPDX-License-Identifier and remove superfluous license
   text.
 - Fix uninitialized walign in try_fmt

Previous cover letter:

we have image conversion code for scaling and colorspace conversion in
the IPUv3 base driver for a while. Since the IC hardware can only write
up to 1024x1024 pixel buffers, it scales to larger output buffers by
splitting the input and output frame into similarly sized tiles.

This causes the issue that the bilinear interpolation resets at the tile
boundary: instead of smoothly interpolating across the seam, there is a
jump in the input sample position that is very apparent for high
upscaling factors. This can be avoided by slightly changing the scaling
coefficients to let the left/top tiles overshoot their input sampling
into the first pixel / line of their right / bottom neighbors. The error
can be further reduced by letting tiles be differently sized and by
selecting seam positions that minimize the input sampling position error
at tile boundaries.
This is complicated by different DMA start address, burst size, and
rotator block size alignment requirements, depending on the input and
output pixel formats, and the fact that flipping happens in different
places depending on the rotation.

This series implements optimal seam position selection and seam hiding
with per-tile resizing coefficients and adds a scaling mem2mem device
to the imx-media driver.

regards
Philipp

Philipp Zabel (16):
  gpu: ipu-v3: ipu-ic: allow to manually set resize coefficients
  gpu: ipu-v3: image-convert: prepare for per-tile configuration
  gpu: ipu-v3: image-convert: calculate per-tile resize coefficients
  gpu: ipu-v3: image-convert: reconfigure IC per tile
  gpu: ipu-v3: image-convert: store tile top/left position
  gpu: ipu-v3: image-convert: calculate tile dimensions and offsets
outside fill_image
  gpu: ipu-v3: image-convert: move tile alignment helpers
  gpu: ipu-v3: image-convert: select optimal seam positions
  gpu: ipu-v3: image-convert: fix debug output for varying tile sizes
  gpu: ipu-v3: image-convert: relax tile width alignment for NV12 and
NV16
  gpu: ipu-v3: image-convert: relax input alignment restrictions
  gpu: ipu-v3: image-convert: relax output alignment restrictions
  gpu: ipu-v3: image-convert: fix bytesperline adjustment
  gpu: ipu-v3: image-convert: add some ASCII art to the exposition
  gpu: ipu-v3: image-convert: disable double buffering if necessary
  media: imx: add mem2mem device

 drivers/gpu/ipu-v3/ipu-ic.c   |  52 +-
 drivers/gpu/ipu-v3/ipu-image-convert.c| 870 +---
 drivers/staging/media/imx/Kconfig |   1 +
 drivers/staging/media/imx/Makefile|   1 +
 drivers/staging/media/imx/imx-media-dev.c |  11 +
 drivers/staging/media/imx/imx-media-mem2mem.c | 946 ++
 drivers/staging/media/imx/imx-media.h |  10 +
 include/video/imx-ipu-v3.h|   6 +
 8 files changed, 1758 insertions(+), 139 deletions(-)
 create mode 100644 drivers/staging/media/imx/imx-media-mem2mem.c

-- 
2.18.0