Re: [linux-sunxi] [PATCH v6 4/8] media: platform: Add Cedrus VPU decoder driver

2018-08-07 Thread Jernej Škrabec
Dne torek, 07. avgust 2018 ob 14:31:03 CEST je Paul Kocialkowski napisal(a):
> Hi,
> 
> On Fri, 2018-07-27 at 16:58 +0200, Jernej Škrabec wrote:
> > Dne petek, 27. julij 2018 ob 16:03:41 CEST je Jernej Škrabec napisal(a):
> > > Hi!
> > > 
> > > Dne sreda, 25. julij 2018 ob 12:02:52 CEST je Paul Kocialkowski 
napisal(a):
> > > > This introduces the Cedrus VPU driver that supports the VPU found in
> > > > Allwinner SoCs, also known as Video Engine. It is implemented through
> > > > a v4l2 m2m decoder device and a media device (used for media
> > > > requests).
> > > > So far, it only supports MPEG2 decoding.
> > > > 
> > > > Since this VPU is stateless, synchronization with media requests is
> > > > required in order to ensure consistency between frame headers that
> > > > contain metadata about the frame to process and the raw slice data
> > > > that
> > > > is used to generate the frame.
> > > > 
> > > > This driver was made possible thanks to the long-standing effort
> > > > carried out by the linux-sunxi community in the interest of reverse
> > > > engineering, documenting and implementing support for Allwinner VPU.
> > > > 
> > > > Signed-off-by: Paul Kocialkowski 
> > > > ---
> > > 
> > > 
> > > 
> > > > +void cedrus_dst_format_set(struct cedrus_dev *dev,
> > > > +  struct v4l2_pix_format_mplane *fmt)
> > > > +{
> > > > +   unsigned int width = fmt->width;
> > > > +   unsigned int height = fmt->height;
> > > > +   u32 chroma_size;
> > > > +   u32 reg;
> > > > +
> > > > +   switch (fmt->pixelformat) {
> > > > +   case V4L2_PIX_FMT_NV12:
> > > > +   chroma_size = ALIGN(width, 32) * ALIGN(height / 2, 32);
> > > 
> > > After some testing, it turns out that right aligment for untiled format
> > > is
> > > 16.
> > > 
> > > > +
> > > > +   reg = VE_PRIMARY_OUT_FMT_NV12 |
> > > > + VE_SECONDARY_SPECIAL_OUT_FMT_NV12;
> > > > +   cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > > > +
> > > > +   reg = VE_CHROMA_BUF_LEN_SDRT(chroma_size / 2) |
> > > > + VE_SECONDARY_OUT_FMT_SPECIAL;
> > > > +   cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > > > +
> > > > +   reg = chroma_size / 2;
> > > > +   cedrus_write(dev, VE_PRIMARY_CHROMA_BUF_LEN, reg);
> > > > +
> > > > +   reg = VE_PRIMARY_FB_LINE_STRIDE_LUMA(ALIGN(width, 32)) |
> > > 
> > > ^ that one should be aligned to 16
> > > 
> > > > + VE_PRIMARY_FB_LINE_STRIDE_CHROMA(ALIGN(width / 2, 
> > > > 16));
> > 
> > It seems that CHROMA has to be aligned to 8 ^
> 
> I think the issue here is that the divider should be applied after the
> alignment, not before, such as: ALIGN(width, 16) / 2, which also
> provides a 8-aligned value.
> 
> Feel free to let me know if that causes any particular issue!

I think this is only semantics, it doesn't really matter if it is aligned to 
16 first and then divided by 2 or divided by 2 and then aligned to 8.

BTW, doc says that chroma should be aligned to 8 and for interleaved formats 
chroma line stride should be double... Maybe it should be just ALIGN(width, 
8)?

> 
> > That, with previosly comments, completely solves issues for one of my
> > sample video. However, there are still sample videos with issues. Those
> > are mostly rendered green with slight impressions of right image. Maybe
> > LUMA issue?
> Can you check whether these videos are interlaced? I think those don't
> do very well with our driver at this point.

Your driver works perfectly with interleaved videos. I tested that already. 
It's just that video playback quality is a little lower due to missing 
deinterlace step, but still watchable.

It turns out, that video in question [1] also doesn't work with ffmpeg + Intel 
vaapi on PC, while it works if SW decoding is used in ffmpeg. I guess that 
means that there is some bug in ffmpeg. But please be aware that ffmpeg spits 
out some warnings when processing it, so video file might be considered 
broken.

Best regards,
Jernej

[1] http://jernej.libreelec.tv/videos/20160219171612-Pink%20SI.mpg

> 
> Cheers and thanks for the useful work and feedback!
> 
> Paul
> 
> > Best regards,
> > Jernej
> > 
> > > > +   cedrus_write(dev, VE_PRIMARY_FB_LINE_STRIDE, reg);
> > > > +
> > > > +   break;
> > > > +   case V4L2_PIX_FMT_MB32_NV12:
> > > > +   default:
> > > > +   reg = VE_PRIMARY_OUT_FMT_MB32_NV12;
> > > > +   cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > > > +
> > > > +   reg = VE_SECONDARY_OUT_FMT_MB32_NV12;
> > > > +   cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > > > +
> > > > +   break;
> > > > +   }
> > > > +}
> > > 
> > > 
> > > 
> > > > +static void cedrus_prepare_plane_format(struct cedrus_format *fmt,
> > > > +   struct v4l2_format *f,
> > > > +   

Re: [linux-sunxi] [PATCH v6 4/8] media: platform: Add Cedrus VPU decoder driver

2018-08-07 Thread Paul Kocialkowski
Hi,

On Fri, 2018-07-27 at 16:58 +0200, Jernej Škrabec wrote:
> Dne petek, 27. julij 2018 ob 16:03:41 CEST je Jernej Škrabec napisal(a):
> > Hi!
> > 
> > Dne sreda, 25. julij 2018 ob 12:02:52 CEST je Paul Kocialkowski napisal(a):
> > > This introduces the Cedrus VPU driver that supports the VPU found in
> > > Allwinner SoCs, also known as Video Engine. It is implemented through
> > > a v4l2 m2m decoder device and a media device (used for media requests).
> > > So far, it only supports MPEG2 decoding.
> > > 
> > > Since this VPU is stateless, synchronization with media requests is
> > > required in order to ensure consistency between frame headers that
> > > contain metadata about the frame to process and the raw slice data that
> > > is used to generate the frame.
> > > 
> > > This driver was made possible thanks to the long-standing effort
> > > carried out by the linux-sunxi community in the interest of reverse
> > > engineering, documenting and implementing support for Allwinner VPU.
> > > 
> > > Signed-off-by: Paul Kocialkowski 
> > > ---
> > 
> > 
> > 
> > > +void cedrus_dst_format_set(struct cedrus_dev *dev,
> > > +struct v4l2_pix_format_mplane *fmt)
> > > +{
> > > + unsigned int width = fmt->width;
> > > + unsigned int height = fmt->height;
> > > + u32 chroma_size;
> > > + u32 reg;
> > > +
> > > + switch (fmt->pixelformat) {
> > > + case V4L2_PIX_FMT_NV12:
> > > + chroma_size = ALIGN(width, 32) * ALIGN(height / 2, 32);
> > 
> > After some testing, it turns out that right aligment for untiled format is
> > 16.
> > > +
> > > + reg = VE_PRIMARY_OUT_FMT_NV12 |
> > > +   VE_SECONDARY_SPECIAL_OUT_FMT_NV12;
> > > + cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > > +
> > > + reg = VE_CHROMA_BUF_LEN_SDRT(chroma_size / 2) |
> > > +   VE_SECONDARY_OUT_FMT_SPECIAL;
> > > + cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > > +
> > > + reg = chroma_size / 2;
> > > + cedrus_write(dev, VE_PRIMARY_CHROMA_BUF_LEN, reg);
> > > +
> > > + reg = VE_PRIMARY_FB_LINE_STRIDE_LUMA(ALIGN(width, 32)) |
> > 
> > ^ that one should be aligned to 16
> > 
> > > +   VE_PRIMARY_FB_LINE_STRIDE_CHROMA(ALIGN(width / 2, 16));
> 
> It seems that CHROMA has to be aligned to 8 ^

I think the issue here is that the divider should be applied after the
alignment, not before, such as: ALIGN(width, 16) / 2, which also
provides a 8-aligned value.

Feel free to let me know if that causes any particular issue!

> That, with previosly comments, completely solves issues for one of my sample 
> video. However, there are still sample videos with issues. Those are mostly 
> rendered green with slight impressions of right image. Maybe LUMA issue?

Can you check whether these videos are interlaced? I think those don't
do very well with our driver at this point.

Cheers and thanks for the useful work and feedback!

Paul

> Best regards,
> Jernej
> 
> > > + cedrus_write(dev, VE_PRIMARY_FB_LINE_STRIDE, reg);
> > > +
> > > + break;
> > > + case V4L2_PIX_FMT_MB32_NV12:
> > > + default:
> > > + reg = VE_PRIMARY_OUT_FMT_MB32_NV12;
> > > + cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > > +
> > > + reg = VE_SECONDARY_OUT_FMT_MB32_NV12;
> > > + cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > > +
> > > + break;
> > > + }
> > > +}
> > 
> > 
> > 
> > > +static void cedrus_prepare_plane_format(struct cedrus_format *fmt,
> > > + struct v4l2_format *f,
> > > + unsigned int i)
> > > +{
> > > + struct v4l2_plane_pix_format *plane_fmt = >fmt.pix_mp.plane_fmt[i];
> > > + unsigned int width = f->fmt.pix_mp.width;
> > > + unsigned int height = f->fmt.pix_mp.height;
> > > + unsigned int sizeimage = plane_fmt->sizeimage;
> > > + unsigned int bytesperline = plane_fmt->bytesperline;
> > > +
> > > + switch (fmt->pixelformat) {
> > > + case V4L2_PIX_FMT_MPEG2_SLICE:
> > > + /* Zero bytes per line. */
> > > + bytesperline = 0;
> > > + break;
> > > +
> > > + case V4L2_PIX_FMT_MB32_NV12:
> > > + /* 32-aligned stride. */
> > > + bytesperline = ALIGN(width, 32);
> > > +
> > > + /* 32-aligned (luma) height. */
> > > + height = ALIGN(height, 32);
> > > +
> > > + if (i == 0)
> > > + /* 32-aligned luma size. */
> > > + sizeimage = bytesperline * height;
> > > + else if (i == 1)
> > > + /* 32-aligned chroma size with 2x2 sub-sampling. */
> > > + sizeimage = bytesperline * ALIGN(height / 2, 32);
> > > +
> > > + break;
> > > +
> > > + case V4L2_PIX_FMT_NV12:
> > > + /* 32-aligned stride. */
> > > + bytesperline = ALIGN(width, 32);
> > 
> > ^ and that one should be aligned to 16 too.
> > 
> > This partially fixes some MPEG2 videos I have tested with Kodi. I think
> > there are other aligment issues, but I 

Re: [linux-sunxi] [PATCH v6 4/8] media: platform: Add Cedrus VPU decoder driver

2018-08-07 Thread Paul Kocialkowski
Hi,

On Fri, 2018-07-27 at 16:03 +0200, Jernej Škrabec wrote:
> Hi!
> 
> Dne sreda, 25. julij 2018 ob 12:02:52 CEST je Paul Kocialkowski napisal(a):
> > This introduces the Cedrus VPU driver that supports the VPU found in
> > Allwinner SoCs, also known as Video Engine. It is implemented through
> > a v4l2 m2m decoder device and a media device (used for media requests).
> > So far, it only supports MPEG2 decoding.
> > 
> > Since this VPU is stateless, synchronization with media requests is
> > required in order to ensure consistency between frame headers that
> > contain metadata about the frame to process and the raw slice data that
> > is used to generate the frame.
> > 
> > This driver was made possible thanks to the long-standing effort
> > carried out by the linux-sunxi community in the interest of reverse
> > engineering, documenting and implementing support for Allwinner VPU.
> > 
> > Signed-off-by: Paul Kocialkowski 
> > ---
> 
> 
> 
> > +void cedrus_dst_format_set(struct cedrus_dev *dev,
> > +  struct v4l2_pix_format_mplane *fmt)
> > +{
> > +   unsigned int width = fmt->width;
> > +   unsigned int height = fmt->height;
> > +   u32 chroma_size;
> > +   u32 reg;
> > +
> > +   switch (fmt->pixelformat) {
> > +   case V4L2_PIX_FMT_NV12:
> > +   chroma_size = ALIGN(width, 32) * ALIGN(height / 2, 32);
> 
> After some testing, it turns out that right aligment for untiled format is 16.

Thanks for looking into it, figuring out the alignment constraints from
the Allwinner reference code is just a plain headache... I confirm that
aligning to 16 works and allows properly untiling previously-broken
videos.

I've also removed the divison factors out of the alignment, like it's
done in the reference code.

> > +
> > +   reg = VE_PRIMARY_OUT_FMT_NV12 |
> > + VE_SECONDARY_SPECIAL_OUT_FMT_NV12;
> > +   cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > +
> > +   reg = VE_CHROMA_BUF_LEN_SDRT(chroma_size / 2) |
> > + VE_SECONDARY_OUT_FMT_SPECIAL;
> > +   cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > +
> > +   reg = chroma_size / 2;
> > +   cedrus_write(dev, VE_PRIMARY_CHROMA_BUF_LEN, reg);
> > +
> > +   reg = VE_PRIMARY_FB_LINE_STRIDE_LUMA(ALIGN(width, 32)) |
> 
> ^ that one should be aligned to 16

Will do in v7.

> > + VE_PRIMARY_FB_LINE_STRIDE_CHROMA(ALIGN(width / 2, 16));
> > +   cedrus_write(dev, VE_PRIMARY_FB_LINE_STRIDE, reg);
> > +
> > +   break;
> > +   case V4L2_PIX_FMT_MB32_NV12:
> > +   default:
> > +   reg = VE_PRIMARY_OUT_FMT_MB32_NV12;
> > +   cedrus_write(dev, VE_PRIMARY_OUT_FMT, reg);
> > +
> > +   reg = VE_SECONDARY_OUT_FMT_MB32_NV12;
> > +   cedrus_write(dev, VE_CHROMA_BUF_LEN, reg);
> > +
> > +   break;
> > +   }
> > +}
> 
> 
> 
> > +static void cedrus_prepare_plane_format(struct cedrus_format *fmt,
> > +   struct v4l2_format *f,
> > +   unsigned int i)
> > +{
> > +   struct v4l2_plane_pix_format *plane_fmt = >fmt.pix_mp.plane_fmt[i];
> > +   unsigned int width = f->fmt.pix_mp.width;
> > +   unsigned int height = f->fmt.pix_mp.height;
> > +   unsigned int sizeimage = plane_fmt->sizeimage;
> > +   unsigned int bytesperline = plane_fmt->bytesperline;
> > +
> > +   switch (fmt->pixelformat) {
> > +   case V4L2_PIX_FMT_MPEG2_SLICE:
> > +   /* Zero bytes per line. */
> > +   bytesperline = 0;
> > +   break;
> > +
> > +   case V4L2_PIX_FMT_MB32_NV12:
> > +   /* 32-aligned stride. */
> > +   bytesperline = ALIGN(width, 32);
> > +
> > +   /* 32-aligned (luma) height. */
> > +   height = ALIGN(height, 32);
> > +
> > +   if (i == 0)
> > +   /* 32-aligned luma size. */
> > +   sizeimage = bytesperline * height;
> > +   else if (i == 1)
> > +   /* 32-aligned chroma size with 2x2 sub-sampling. */
> > +   sizeimage = bytesperline * ALIGN(height / 2, 32);
> > +
> > +   break;
> > +
> > +   case V4L2_PIX_FMT_NV12:
> > +   /* 32-aligned stride. */
> > +   bytesperline = ALIGN(width, 32);
> 
> ^ and that one should be aligned to 16 too.
> 
> This partially fixes some MPEG2 videos I have tested with Kodi. I think there 
> are other aligment issues, but I have to find them first.

I also found that the height (used for sizeimage calculation) has to be
aligned to 16 in this case, otherwise some garbage can be seen on the
top rows of the untiled frame with non-aligned heights.

Cheers and thanks again for your findings!

Paul

> Best regards,
> Jernej
> 
> > +
> > +   if (i == 0)
> > +   /* Regular luma size. */
> > +   sizeimage = bytesperline * height;
> > +   else if (i == 1)
> > +   /* Regular chroma size with 2x2 sub-sampling. */
> > +  

Re: [linux-sunxi] [PATCH v6 4/8] media: platform: Add Cedrus VPU decoder driver

2018-08-07 Thread Paul Kocialkowski
Hi,

On Sun, 2018-07-29 at 09:58 +0200, Jernej Škrabec wrote:
> Hi!
> 
> Dne sreda, 25. julij 2018 ob 12:02:52 CEST je Paul Kocialkowski napisal(a):
> > This introduces the Cedrus VPU driver that supports the VPU found in
> > Allwinner SoCs, also known as Video Engine. It is implemented through
> > a v4l2 m2m decoder device and a media device (used for media requests).
> > So far, it only supports MPEG2 decoding.
> > 
> > Since this VPU is stateless, synchronization with media requests is
> > required in order to ensure consistency between frame headers that
> > contain metadata about the frame to process and the raw slice data that
> > is used to generate the frame.
> > 
> > This driver was made possible thanks to the long-standing effort
> > carried out by the linux-sunxi community in the interest of reverse
> > engineering, documenting and implementing support for Allwinner VPU.
> > 
> > Signed-off-by: Paul Kocialkowski 
> > ---
> 
> 

[...]

> > +static void cedrus_mpeg2_setup(struct cedrus_ctx *ctx, struct cedrus_run
> > *run) +{
> > +   const struct v4l2_ctrl_mpeg2_slice_params *slice_params;
> > +   const struct v4l2_ctrl_mpeg2_quantization *quantization;
> > +   dma_addr_t src_buf_addr, dst_luma_addr, dst_chroma_addr;
> > +   dma_addr_t fwd_luma_addr, fwd_chroma_addr;
> > +   dma_addr_t bwd_luma_addr, bwd_chroma_addr;
> > +   struct cedrus_dev *dev = ctx->dev;
> > +   u32 vld_end, vld_len;
> > +   const u8 *matrix;
> > +   unsigned int i;
> > +   u32 reg;
> > +
> > +   slice_params = run->mpeg2.slice_params;
> > +   quantization = run->mpeg2.quantization;
> > +
> > +   /* Activate MPEG engine. */
> > +   cedrus_engine_enable(dev, CEDRUS_CODEC_MPEG2);
> > +
> > +   /* Set intra quantization matrix. */
> > +
> > +   if (quantization && quantization->load_intra_quantiser_matrix)
> > +   matrix = quantization->intra_quantiser_matrix;
> > +   else
> > +   matrix = intra_quantization_matrix_default;
> > +
> > +   for (i = 0; i < 64; i++) {
> > +   reg = VE_DEC_MPEG_IQMINPUT_WEIGHT(i, matrix[i]);
> > +   reg |= VE_DEC_MPEG_IQMINPUT_FLAG_INTRA;
> > +
> > +   cedrus_write(dev, VE_DEC_MPEG_IQMINPUT, reg);
> > +   }
> > +
> > +   /* Set non-intra quantization matrix. */
> > +
> > +   if (quantization && quantization->load_non_intra_quantiser_matrix)
> > +   matrix = quantization->non_intra_quantiser_matrix;
> > +   else
> > +   matrix = non_intra_quantization_matrix_default;
> > +
> > +   for (i = 0; i < 64; i++) {
> > +   reg = VE_DEC_MPEG_IQMINPUT_WEIGHT(i, matrix[i]);
> > +   reg |= VE_DEC_MPEG_IQMINPUT_FLAG_NON_INTRA;
> > +
> > +   cedrus_write(dev, VE_DEC_MPEG_IQMINPUT, reg);
> > +   }
> > +
> > +   /* Set MPEG picture header. */
> > +
> > +   reg = VE_DEC_MPEG_MP12HDR_SLICE_TYPE(slice_params->slice_type);
> > +   reg |= VE_DEC_MPEG_MP12HDR_F_CODE(0, 0, slice_params->f_code[0][0]);
> > +   reg |= VE_DEC_MPEG_MP12HDR_F_CODE(0, 1, slice_params->f_code[0][1]);
> > +   reg |= VE_DEC_MPEG_MP12HDR_F_CODE(1, 0, slice_params->f_code[1][0]);
> > +   reg |= VE_DEC_MPEG_MP12HDR_F_CODE(1, 1, slice_params->f_code[1][1]);
> > +   reg |=
> > VE_DEC_MPEG_MP12HDR_INTRA_DC_PRECISION(slice_params->intra_dc_precision);
> > +   reg |=
> > VE_DEC_MPEG_MP12HDR_INTRA_PICTURE_STRUCTURE(slice_params->picture_structure
> > ); +reg |=
> > VE_DEC_MPEG_MP12HDR_TOP_FIELD_FIRST(slice_params->top_field_first); +   
> > reg
> > > =
> > 
> > VE_DEC_MPEG_MP12HDR_FRAME_PRED_FRAME_DCT(slice_params->frame_pred_frame_dct
> > ); +reg |=
> > VE_DEC_MPEG_MP12HDR_CONCEALMENT_MOTION_VECTORS(slice_params->concealment_mo
> > tion_vectors); +reg |=
> > VE_DEC_MPEG_MP12HDR_Q_SCALE_TYPE(slice_params->q_scale_type); + reg |=
> > VE_DEC_MPEG_MP12HDR_INTRA_VLC_FORMAT(slice_params->intra_vlc_format); + 
> 
> reg
> > > = VE_DEC_MPEG_MP12HDR_ALTERNATE_SCAN(slice_params->alternate_scan); + 
> > > reg
> > > = VE_DEC_MPEG_MP12HDR_FULL_PEL_FORWARD_VECTOR(0);
> > 
> > +   reg |= VE_DEC_MPEG_MP12HDR_FULL_PEL_BACKWARD_VECTOR(0);
> > +
> > +   cedrus_write(dev, VE_DEC_MPEG_MP12HDR, reg);
> > +
> > +   /* Set frame dimensions. */
> > +
> > +   reg = VE_DEC_MPEG_PICCODEDSIZE_WIDTH(slice_params->width);
> > +   reg |= VE_DEC_MPEG_PICCODEDSIZE_HEIGHT(slice_params->height);
> > +
> > +   cedrus_write(dev, VE_DEC_MPEG_PICCODEDSIZE, reg);
> > +
> > +   reg = VE_DEC_MPEG_PICBOUNDSIZE_WIDTH(slice_params->width);
> > +   reg |= VE_DEC_MPEG_PICBOUNDSIZE_HEIGHT(slice_params->height);
> > +
> > +   cedrus_write(dev, VE_DEC_MPEG_PICBOUNDSIZE, reg);
> > +
> > +   /* Forward and backward prediction reference buffers. */
> > +
> > +   fwd_luma_addr = cedrus_dst_buf_addr(ctx, 
> > slice_params->forward_ref_index,
> > 0); +   fwd_chroma_addr = cedrus_dst_buf_addr(ctx,
> > slice_params->forward_ref_index, 1); +
> > +   cedrus_write(dev, VE_DEC_MPEG_FWD_REF_LUMA_ADDR, fwd_luma_addr);
> > +   cedrus_write(dev, VE_DEC_MPEG_FWD_REF_CHROMA_ADDR, fwd_chroma_addr);
> > +
> >