Re: [Mesa-dev] nouveau hardware decoding and vaDeriveImage

2017-08-21 Thread Philipp Kerling
Hi Julien,

thanks for providing some background on the issue. Now it makes a lot
more sense.

Am Mittwoch, den 16.08.2017, 10:40 +0100 schrieb Julien Isorce:
> Hi,
> 
> This issue is tracked here https://bugs.freedesktop.org/show_bug.cgi?
> id=98285
> This is due to a limitation in libva API which only supports 1 FD per
> VaSurface.
> It is enough for intel-driver because NV12 is only 1 bo. But in
> nouveau driver, NV12 is 2 separates bo (omitting the interlaced pb).
OK, guess we'll have to wait for libva to support this use case first.

> Note that this libva API limitation is also a problem for AMD
> hardware because NV12 is also 2 non-contiguous bo (see function
> 'si_video_buffer_create' and 'nouveau_vp3_video_buffer_create' mesa
> source).
If I got you right, this basically means vaDeriveImage with the usual
NV12 configuration won't work on AMD either?

> I do not know if this is HW requirement, or if using one contiguous
> bo with offset handling would work. So currently there is 2
> independent calls to pipe->screen->resource_create. But maybe it
> would also work if the second resource would somehow has the same
> underlying memory but with an offset.
> 
> The workaround for now is to convert it to RGB: (the following
> pipeline works on both nvidia and amd hardware)
> 
> GST_GL_PLATFORM=egl GST_GL_API=gles2 gst-launch-1.0 filesrc
> location=test.mp4 ! qtdemux ! h264parse ! vaapih264dec !
> vaapipostproc ! "video/x-raw(memory:DMABuf), format=RGBA" !
> glimagesink
> 
> In the pipeline above, vaapipostproc will receive a NV12 vaSurface as
> input and will convert it to a RGBA vaSurface. And export it as 1
> dmabuf FD. Which will be imported by glimagesink by using
> eglCreateImage. This is not a full zero-copy, but at least this is
> zero-cpu-copy, the conversion being done on the gpu. 
I see, this workaround could indeed be very useful. But if H.264
decoding still gives artifacts anyway (like Ilia said), I do wonder if
it's worth the trouble to implement this specifically for nouveau and
then end up never using it in production because the decoded video is
not presentable to the user. Probably best to wait for libva :-/

> 
> Cheers
> Julien

Best regards,
Philipp
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] nouveau hardware decoding and vaDeriveImage

2017-08-16 Thread Julien Isorce
Hi,

This issue is tracked here https://bugs.freedesktop.org/s
how_bug.cgi?id=98285
This is due to a limitation in libva API which only supports 1 FD per
VaSurface.
It is enough for intel-driver because NV12 is only 1 bo. But in nouveau
driver, NV12 is 2 separates bo (omitting the interlaced pb).

Note that this libva API limitation is also a problem for AMD hardware
because NV12 is also 2 non-contiguous bo (see function
'si_video_buffer_create' and 'nouveau_vp3_video_buffer_create' mesa
source). I do not know if this is HW requirement, or if using one
contiguous bo with offset handling would work. So currently there is 2
independent calls to pipe->screen->resource_create. But maybe it would also
work if the second resource would somehow has the same underlying memory
but with an offset.

The workaround for now is to convert it to RGB: (the following pipeline
works on both nvidia and amd hardware)

GST_GL_PLATFORM=egl GST_GL_API=gles2 gst-launch-1.0 filesrc
location=test.mp4 ! qtdemux ! h264parse ! vaapih264dec ! vaapipostproc !
"video/x-raw(memory:DMABuf), format=*RGBA*" ! glimagesink

In the pipeline above, vaapipostproc will receive a NV12 vaSurface as input
and will convert it to a RGBA vaSurface. And export it as 1 dmabuf FD.
Which will be imported by glimagesink by using eglCreateImage. This is not
a full zero-copy, but at least this is zero-cpu-copy, the conversion being
done on the gpu.

Cheers
Julien


On 16 August 2017 at 03:06, Ilia Mirkin  wrote:

> [adding Julien, who did the nouveau va-api integration.]
>
> On Tue, Aug 15, 2017 at 9:59 AM, Philipp Kerling 
> wrote:
> > Hi,
> >
> > I recently noticed that hardware video decoding with the nouveau driver
> > and libva does not play nicely with Wayland and decided to dig a bit.
> >
> > To use libva in conjunction with Wayland in EGL applications, you
> > usually export the surface dmabuf via vaDeriveImage &
> > vaAcquireBufferHandle and then import it in EGL via eglCreateImageKHR &
> > EGL_LINUX_DMA_BUF_EXT.
> > However, this does not work with nouveau.
> >
> > The first problem I could identify is that nouveau uses NV12 as buffer
> > format and mesa vaDeriveImage cannot handle that (see FourCCs handled
> > in vlVaDeriveImage). It seems to be easy to add though (just have to
> > set num_planes/offset etc. correctly), and I did so locally. Also,
> > nouveau always sets the interlaced flag on the buffer, which
> > vlVaDeriveImage also errors out on. Not sure why it does that and what
> > to do with that currently, I just removed the check locally.
>
> The hw decoder produces an interlaced NV12 image -- i.e. the Y plane
> is 2 images and the UV plane is 2 images, below one another. This is
> in firmware written in falcon (VP3+) or xtensa (VP2) microcode by
> NVIDIA. There is no open-source implementation, and thus would be
> tricky to change. It's possible that this firmware supports other
> output formats as well ... but we're not presently aware of it.
> Presumably the 10-bit decoding is done *somehow*.
>
> >
> > Then I hit another problem, which is that NV12 uses two planes and thus
> > has an offset into the buffer for the second plane, but nouveau cannot
> > handle offsets in eglCreateImageKHR (see nouveau_screen_bo_from_handle
> > which has a check for this).
>
> The VP2-era hardware decoder (G84..G200, minus G98) wants a single
> output buffer - you enter it in as a single base address and then an
> offset to the UV plane. It's a 32-bit offset but a 40-bit VA space.
> Later decoder generations (VP3+) have a separate address for both Y
> and UV planes. Actually looking over the VP2 logic (which I'm sad to
> say I wrote many years ago), it *might* be possible to have separate
> buffers for the final output image (in the second VP step).
>
> > Is there an easy/obvious way to add handling for the offset parameter
> > in nouveau_screen.c? This might be all that is currently breaking hwdec
> > on nouveau+Wayland, but I couldn't test it of course, so there might
> > still be other problems lurking.
>
> Well, strictly speaking this should be easy - while nouveau_bo has no
> offset, nv04_resource does - specifically things tend to use
> nv04_resource->address. I believe this offset stuff was added
> recently, and I don't know much about EGL or Wayland or ... well a lot
> of the new hotness. So e.g. in nv50_miptree_from_handle you could set
> the mt->base.address = bo->offset + whandle->offset; I wonder if the
> tiling stuff will be an issue -- the decoder can get *very* picking
> about tiling -- somehow you'll have to ensure that the images are
> allocated with proper tiling flags set -- see how nvc0_miptree_create
> calls nvc0_miptree_init_layout_video, which sets tile_mode to 0x10.
> That's not by coincidence.
>
> BTW, note that you can't use a decoder in one thread and GL commands
> in another thread. This will cause death in nouveau. Also note that
> there are known (but unfixed) artifacts when decoding some H.264
> videos.
>
> Feel free to join u

Re: [Mesa-dev] nouveau hardware decoding and vaDeriveImage

2017-08-15 Thread Ilia Mirkin
[adding Julien, who did the nouveau va-api integration.]

On Tue, Aug 15, 2017 at 9:59 AM, Philipp Kerling  wrote:
> Hi,
>
> I recently noticed that hardware video decoding with the nouveau driver
> and libva does not play nicely with Wayland and decided to dig a bit.
>
> To use libva in conjunction with Wayland in EGL applications, you
> usually export the surface dmabuf via vaDeriveImage &
> vaAcquireBufferHandle and then import it in EGL via eglCreateImageKHR &
> EGL_LINUX_DMA_BUF_EXT.
> However, this does not work with nouveau.
>
> The first problem I could identify is that nouveau uses NV12 as buffer
> format and mesa vaDeriveImage cannot handle that (see FourCCs handled
> in vlVaDeriveImage). It seems to be easy to add though (just have to
> set num_planes/offset etc. correctly), and I did so locally. Also,
> nouveau always sets the interlaced flag on the buffer, which
> vlVaDeriveImage also errors out on. Not sure why it does that and what
> to do with that currently, I just removed the check locally.

The hw decoder produces an interlaced NV12 image -- i.e. the Y plane
is 2 images and the UV plane is 2 images, below one another. This is
in firmware written in falcon (VP3+) or xtensa (VP2) microcode by
NVIDIA. There is no open-source implementation, and thus would be
tricky to change. It's possible that this firmware supports other
output formats as well ... but we're not presently aware of it.
Presumably the 10-bit decoding is done *somehow*.

>
> Then I hit another problem, which is that NV12 uses two planes and thus
> has an offset into the buffer for the second plane, but nouveau cannot
> handle offsets in eglCreateImageKHR (see nouveau_screen_bo_from_handle
> which has a check for this).

The VP2-era hardware decoder (G84..G200, minus G98) wants a single
output buffer - you enter it in as a single base address and then an
offset to the UV plane. It's a 32-bit offset but a 40-bit VA space.
Later decoder generations (VP3+) have a separate address for both Y
and UV planes. Actually looking over the VP2 logic (which I'm sad to
say I wrote many years ago), it *might* be possible to have separate
buffers for the final output image (in the second VP step).

> Is there an easy/obvious way to add handling for the offset parameter
> in nouveau_screen.c? This might be all that is currently breaking hwdec
> on nouveau+Wayland, but I couldn't test it of course, so there might
> still be other problems lurking.

Well, strictly speaking this should be easy - while nouveau_bo has no
offset, nv04_resource does - specifically things tend to use
nv04_resource->address. I believe this offset stuff was added
recently, and I don't know much about EGL or Wayland or ... well a lot
of the new hotness. So e.g. in nv50_miptree_from_handle you could set
the mt->base.address = bo->offset + whandle->offset; I wonder if the
tiling stuff will be an issue -- the decoder can get *very* picking
about tiling -- somehow you'll have to ensure that the images are
allocated with proper tiling flags set -- see how nvc0_miptree_create
calls nvc0_miptree_init_layout_video, which sets tile_mode to 0x10.
That's not by coincidence.

BTW, note that you can't use a decoder in one thread and GL commands
in another thread. This will cause death in nouveau. Also note that
there are known (but unfixed) artifacts when decoding some H.264
videos.

Feel free to join us in #nouveau on irc.freenode.net.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] nouveau hardware decoding and vaDeriveImage

2017-08-15 Thread Philipp Kerling
Hi,

I recently noticed that hardware video decoding with the nouveau driver
and libva does not play nicely with Wayland and decided to dig a bit.

To use libva in conjunction with Wayland in EGL applications, you
usually export the surface dmabuf via vaDeriveImage &
vaAcquireBufferHandle and then import it in EGL via eglCreateImageKHR &
EGL_LINUX_DMA_BUF_EXT.
However, this does not work with nouveau.

The first problem I could identify is that nouveau uses NV12 as buffer
format and mesa vaDeriveImage cannot handle that (see FourCCs handled
in vlVaDeriveImage). It seems to be easy to add though (just have to
set num_planes/offset etc. correctly), and I did so locally. Also,
nouveau always sets the interlaced flag on the buffer, which
vlVaDeriveImage also errors out on. Not sure why it does that and what
to do with that currently, I just removed the check locally.

Then I hit another problem, which is that NV12 uses two planes and thus
has an offset into the buffer for the second plane, but nouveau cannot
handle offsets in eglCreateImageKHR (see nouveau_screen_bo_from_handle
which has a check for this).
Is there an easy/obvious way to add handling for the offset parameter
in nouveau_screen.c? This might be all that is currently breaking hwdec
on nouveau+Wayland, but I couldn't test it of course, so there might
still be other problems lurking.

Best regards,
Philipp Kerling
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev