Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-22 Thread Xiang, Haihao

Thanks, Mark. I am looking the issue now.


> On 13/01/17 00:02, Xiang, Haihao wrote:
> > 
> > Thanks for the detailed info, I will look into the issue. BTW can you try
> > other vaapi based tools, such as yamitranscode?
> 
> Yes, I can now reproduce this with yamitranscode as well.
> 
> Given the video below, extract the H.264 stream and then:
> 
> ./yamitranscode -i in.h264 -f 60 --rcmode CBR --ow 1920 --oh 1080 -c VP8 -b
> 500 -N 1000 -o out.ivf
> 
> consistently gives both a broken output stream (though partially playable) and
> a GPU hang somewhere in the middle on Skylake.
> 
> Example GPU error dump: <http://ixia.jkqxz.net/~mrt/libva/vp8/sys_class_drm_ca
> rd0_error_yami>.
> 
> Thanks,
> 
> - Mark
> 
> 
> > > -Original Message-
> > > From: Mark Thompson [mailto:s...@jkqxz.net]
> > > Sent: Friday, January 13, 2017 6:11 AM
> > > To: libva@lists.freedesktop.org; Xiang, Haihao 
> > > Subject: Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8
> > > encoding
> > > shaders on BSW
> > > 
> > > On 12/01/17 07:30, Xiang, Haihao wrote:
> > > > 
> > > > Hi Mark,
> > > > 
> > > > Can you reproduce the issue you mentioned below? If yes, I would like
> > > > to fix it in the new version of the patch series.
> > > 
> > > Hi,
> > > 
> > > Yes, I can reproduce it consistently on both Skylake and Kaby Lake.  Some
> > > detailed instructions follow...
> > > 
> > > 
> > > Get standard test input:
> > > <http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_10
> > > 80p_60fps_normal.mp4>.  (The input file does matter somewhat: I tried some
> > > others and found it harder to reproduce.  Maybe the highly variable
> > > complexity here helps to show the problem.)
> > > 
> > > Get current libav from git: .
> > > 
> > > Apply patches adding framerate configuration and VP8 encode support to
> > > libav: <http://ixia.jkqxz.net/~mrt/libva/vp8/0001-vaapi_encode-Pass-
> > > framerate-parameters-to-driver.patch>,
> > > <http://ixia.jkqxz.net/~mrt/libva/vp8/0002-vaapi_encode-Add-VP8-
> > > support.patch>.
> > > 
> > > Configure libav with --enable-vaapi and build.
> > > 
> > > 
> > > Now run a transcode from the H.264 of the input file to VP8.  What happens
> > > varies with the bitrate selected (this is now on Skylake GT2, 6300):
> > > 
> > > 
> > > ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
> > > hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
> > > an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
> > > 
> > > (CBR at 5Mbps)  Everything works and the output looks good: yay!  (This is
> > > an
> > > immense improvement over the current driver - if you run the same
> > > command there, the output is working but terrible quality.)
> > > 
> > > 
> > > ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
> > > hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
> > > an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
> > > 
> > > (CBR at 2Mbps)  Mostly works, but the output is broken at times and the
> > > bitrate target is often missed by a long way.
> > > 
> > > 
> > > ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
> > > hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
> > > an -c:v vp8_vaapi -r 60 -b:v 1M out.webm
> > > 
> > > (CBR at 1Mbps)  Consistently hangs at around frame 570.
> > > 
> > > 
> > > [1321697.079583] drm/i915: Resetting chip after gpu hang [1321697.081571]
> > > [drm] GuC firmware load skipped [1321699.063831] [drm] RC6 on
> > > 
> > > /sys/class/drm/card0/error for one case:
> > > <http://ixia.jkqxz.net/~mrt/libva/vp8/sys_class_drm_card0_error>.
> > > 
> > > Backtrace of userspace at the hang point:
> > > 
> > > #0  0x76351cc7 in ioctl () at ../sysdeps/unix/syscall-
> > > template.S:84
> > > #1  0x75821708 in drmIoctl () from /usr/lib/x86_64-linux-
> > > gnu/libdrm.so.2
> > > #2  0x748e3e00 in ?? () from /usr/lib/x86_64-linux-
> > > gnu/libdrm_intel.so.1
> > > #3  0x748e4036 in ?? () from /usr/lib/x86_64-linux-
> > > gnu/libdrm_intel.so.1
> > > #4  0x74bded57

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-22 Thread Mark Thompson
On 13/01/17 00:02, Xiang, Haihao wrote:
> 
> Thanks for the detailed info, I will look into the issue. BTW can you try 
> other vaapi based tools, such as yamitranscode?

Yes, I can now reproduce this with yamitranscode as well.

Given the video below, extract the H.264 stream and then:

./yamitranscode -i in.h264 -f 60 --rcmode CBR --ow 1920 --oh 1080 -c VP8 -b 500 
-N 1000 -o out.ivf

consistently gives both a broken output stream (though partially playable) and 
a GPU hang somewhere in the middle on Skylake.

Example GPU error dump: 
<http://ixia.jkqxz.net/~mrt/libva/vp8/sys_class_drm_card0_error_yami>.

Thanks,

- Mark


>> -Original Message-
>> From: Mark Thompson [mailto:s...@jkqxz.net]
>> Sent: Friday, January 13, 2017 6:11 AM
>> To: libva@lists.freedesktop.org; Xiang, Haihao 
>> Subject: Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding
>> shaders on BSW
>>
>> On 12/01/17 07:30, Xiang, Haihao wrote:
>>>
>>> Hi Mark,
>>>
>>> Can you reproduce the issue you mentioned below? If yes, I would like
>>> to fix it in the new version of the patch series.
>>
>> Hi,
>>
>> Yes, I can reproduce it consistently on both Skylake and Kaby Lake.  Some
>> detailed instructions follow...
>>
>>
>> Get standard test input:
>> <http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_10
>> 80p_60fps_normal.mp4>.  (The input file does matter somewhat: I tried some
>> others and found it harder to reproduce.  Maybe the highly variable
>> complexity here helps to show the problem.)
>>
>> Get current libav from git: .
>>
>> Apply patches adding framerate configuration and VP8 encode support to
>> libav: <http://ixia.jkqxz.net/~mrt/libva/vp8/0001-vaapi_encode-Pass-
>> framerate-parameters-to-driver.patch>,
>> <http://ixia.jkqxz.net/~mrt/libva/vp8/0002-vaapi_encode-Add-VP8-
>> support.patch>.
>>
>> Configure libav with --enable-vaapi and build.
>>
>>
>> Now run a transcode from the H.264 of the input file to VP8.  What happens
>> varies with the bitrate selected (this is now on Skylake GT2, 6300):
>>
>>
>> ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>> hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>> an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
>>
>> (CBR at 5Mbps)  Everything works and the output looks good: yay!  (This is an
>> immense improvement over the current driver - if you run the same
>> command there, the output is working but terrible quality.)
>>
>>
>> ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>> hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>> an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
>>
>> (CBR at 2Mbps)  Mostly works, but the output is broken at times and the
>> bitrate target is often missed by a long way.
>>
>>
>> ./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>> hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>> an -c:v vp8_vaapi -r 60 -b:v 1M out.webm
>>
>> (CBR at 1Mbps)  Consistently hangs at around frame 570.
>>
>>
>> [1321697.079583] drm/i915: Resetting chip after gpu hang [1321697.081571]
>> [drm] GuC firmware load skipped [1321699.063831] [drm] RC6 on
>>
>> /sys/class/drm/card0/error for one case:
>> <http://ixia.jkqxz.net/~mrt/libva/vp8/sys_class_drm_card0_error>.
>>
>> Backtrace of userspace at the hang point:
>>
>> #0  0x76351cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>> #1  0x75821708 in drmIoctl () from /usr/lib/x86_64-linux-
>> gnu/libdrm.so.2
>> #2  0x748e3e00 in ?? () from /usr/lib/x86_64-linux-
>> gnu/libdrm_intel.so.1
>> #3  0x748e4036 in ?? () from /usr/lib/x86_64-linux-
>> gnu/libdrm_intel.so.1
>> #4  0x74bded57 in intel_batchbuffer_flush (batch=0x56c03240)
>> at ../../src/intel_batchbuffer.c:147
>> #5  0x74b9c821 in i965_run_kernel_media_object
>> (ctx=0x56ad15b0, encoder_context=0x56bfff10,
>> gpe_context=0x56b6da38, media_function=11, param=0x7fffd5f0)
>> at ../../src/i965_encoder_vp8.c:2174
>> #6  0x74baa044 in i965_encoder_vp8_pak_tpu (ctx=0x56ad15b0,
>> encode_state=0x56ae5cd0, encoder_context=0x56bfff10)
>> at ../../src/i965_encoder_vp8.c:6503
>> #7  0x74baa5a7 in i965_encoder_vp8_pak_pipeline
>> (ctx=0x56ad15b0, profile=VAProfileVP8Version0_3,
>> encode_state=0x56ae5cd0, encoder_c

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-12 Thread Xiang, Haihao

Thanks for the detailed info, I will look into the issue. BTW can you try other 
vaapi based tools, such as yamitranscode?

Thanks
Haihao

>-Original Message-
>From: Mark Thompson [mailto:s...@jkqxz.net]
>Sent: Friday, January 13, 2017 6:11 AM
>To: libva@lists.freedesktop.org; Xiang, Haihao 
>Subject: Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding
>shaders on BSW
>
>On 12/01/17 07:30, Xiang, Haihao wrote:
>>
>> Hi Mark,
>>
>> Can you reproduce the issue you mentioned below? If yes, I would like
>> to fix it in the new version of the patch series.
>
>Hi,
>
>Yes, I can reproduce it consistently on both Skylake and Kaby Lake.  Some
>detailed instructions follow...
>
>
>Get standard test input:
><http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_10
>80p_60fps_normal.mp4>.  (The input file does matter somewhat: I tried some
>others and found it harder to reproduce.  Maybe the highly variable
>complexity here helps to show the problem.)
>
>Get current libav from git: .
>
>Apply patches adding framerate configuration and VP8 encode support to
>libav: <http://ixia.jkqxz.net/~mrt/libva/vp8/0001-vaapi_encode-Pass-
>framerate-parameters-to-driver.patch>,
><http://ixia.jkqxz.net/~mrt/libva/vp8/0002-vaapi_encode-Add-VP8-
>support.patch>.
>
>Configure libav with --enable-vaapi and build.
>
>
>Now run a transcode from the H.264 of the input file to VP8.  What happens
>varies with the bitrate selected (this is now on Skylake GT2, 6300):
>
>
>./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
>
>(CBR at 5Mbps)  Everything works and the output looks good: yay!  (This is an
>immense improvement over the current driver - if you run the same
>command there, the output is working but terrible quality.)
>
>
>./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>an -c:v vp8_vaapi -r 60 -b:v 5M out.webm
>
>(CBR at 2Mbps)  Mostly works, but the output is broken at times and the
>bitrate target is often missed by a long way.
>
>
>./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi -
>hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -
>an -c:v vp8_vaapi -r 60 -b:v 1M out.webm
>
>(CBR at 1Mbps)  Consistently hangs at around frame 570.
>
>
>[1321697.079583] drm/i915: Resetting chip after gpu hang [1321697.081571]
>[drm] GuC firmware load skipped [1321699.063831] [drm] RC6 on
>
>/sys/class/drm/card0/error for one case:
><http://ixia.jkqxz.net/~mrt/libva/vp8/sys_class_drm_card0_error>.
>
>Backtrace of userspace at the hang point:
>
>#0  0x76351cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:84
>#1  0x75821708 in drmIoctl () from /usr/lib/x86_64-linux-
>gnu/libdrm.so.2
>#2  0x748e3e00 in ?? () from /usr/lib/x86_64-linux-
>gnu/libdrm_intel.so.1
>#3  0x748e4036 in ?? () from /usr/lib/x86_64-linux-
>gnu/libdrm_intel.so.1
>#4  0x74bded57 in intel_batchbuffer_flush (batch=0x56c03240)
>at ../../src/intel_batchbuffer.c:147
>#5  0x74b9c821 in i965_run_kernel_media_object
>(ctx=0x56ad15b0, encoder_context=0x56bfff10,
>gpe_context=0x56b6da38, media_function=11, param=0x7fffd5f0)
>at ../../src/i965_encoder_vp8.c:2174
>#6  0x74baa044 in i965_encoder_vp8_pak_tpu (ctx=0x56ad15b0,
>encode_state=0x56ae5cd0, encoder_context=0x56bfff10)
>at ../../src/i965_encoder_vp8.c:6503
>#7  0x74baa5a7 in i965_encoder_vp8_pak_pipeline
>(ctx=0x56ad15b0, profile=VAProfileVP8Version0_3,
>encode_state=0x56ae5cd0, encoder_context=0x56bfff10)
>at ../../src/i965_encoder_vp8.c:6620
>#8  0x74b988b3 in intel_encoder_end_picture (ctx=0x56ad15b0,
>profile=VAProfileVP8Version0_3, codec_state=0x56ae5cd0,
>hw_context=0x56bfff10) at ../../src/i965_encoder.c:1313
>#9  0x74b8aa23 in i965_EndPicture (ctx=0x56ad15b0,
>context=33554433) at ../../src/i965_drv_video.c:3588
>#10 0x776744ca in vaEndPicture (dpy=0x56ad1540,
>context=33554433) at ../../git/va/va.c:1285
>#11 0x55df76d8 in vaapi_encode_issue (avctx=0x56b5f100,
>pic=0x56c64d20) at
>/home/mrt/video/libav/vp8/libavcodec/vaapi_encode.c:387
>#12 0x55df7f29 in vaapi_encode_step (avctx=0x56b5f100,
>target=0x56c64d20) at
>/home/mrt/video/libav/vp8/libavcodec/vaapi_encode.c:608
>#13 0x55df8be9 in ff_vaapi_encode2 (avctx=0x56b5f100,
>pkt=0x56b19380, input_image=0x56b1

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-12 Thread Mark Thompson
On 12/01/17 07:30, Xiang, Haihao wrote:
> 
> Hi Mark,
> 
> Can you reproduce the issue you mentioned below? If yes, I would like to fix 
> it
> in the new version of the patch series. 

Hi,

Yes, I can reproduce it consistently on both Skylake and Kaby Lake.  Some 
detailed instructions follow...


Get standard test input: 
.
  (The input file does matter somewhat: I tried some others and found it harder 
to reproduce.  Maybe the highly variable complexity here helps to show the 
problem.)

Get current libav from git: .

Apply patches adding framerate configuration and VP8 encode support to libav: 
,
 .

Configure libav with --enable-vaapi and build.


Now run a transcode from the H.264 of the input file to VP8.  What happens 
varies with the bitrate selected (this is now on Skylake GT2, 6300):


./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi 
-hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -an -c:v 
vp8_vaapi -r 60 -b:v 5M out.webm

(CBR at 5Mbps)  Everything works and the output looks good: yay!  (This is an 
immense improvement over the current driver - if you run the same command 
there, the output is working but terrible quality.)


./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi 
-hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -an -c:v 
vp8_vaapi -r 60 -b:v 5M out.webm

(CBR at 2Mbps)  Mostly works, but the output is broken at times and the bitrate 
target is often missed by a long way.


./avconv -y -threads 1 -vaapi_device /dev/dri/renderD128 -hwaccel vaapi 
-hwaccel_output_format vaapi -i bbb_sunflower_1080p_60fps_normal.mp4 -an -c:v 
vp8_vaapi -r 60 -b:v 1M out.webm

(CBR at 1Mbps)  Consistently hangs at around frame 570.


[1321697.079583] drm/i915: Resetting chip after gpu hang
[1321697.081571] [drm] GuC firmware load skipped
[1321699.063831] [drm] RC6 on

/sys/class/drm/card0/error for one case: 
.

Backtrace of userspace at the hang point:

#0  0x76351cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:84
#1  0x75821708 in drmIoctl () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
#2  0x748e3e00 in ?? () from /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1
#3  0x748e4036 in ?? () from /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1
#4  0x74bded57 in intel_batchbuffer_flush (batch=0x56c03240) at 
../../src/intel_batchbuffer.c:147
#5  0x74b9c821 in i965_run_kernel_media_object (ctx=0x56ad15b0, 
encoder_context=0x56bfff10, gpe_context=0x56b6da38, media_function=11, 
param=0x7fffd5f0) at ../../src/i965_encoder_vp8.c:2174
#6  0x74baa044 in i965_encoder_vp8_pak_tpu (ctx=0x56ad15b0, 
encode_state=0x56ae5cd0, encoder_context=0x56bfff10) at 
../../src/i965_encoder_vp8.c:6503
#7  0x74baa5a7 in i965_encoder_vp8_pak_pipeline (ctx=0x56ad15b0, 
profile=VAProfileVP8Version0_3, encode_state=0x56ae5cd0, 
encoder_context=0x56bfff10) at ../../src/i965_encoder_vp8.c:6620
#8  0x74b988b3 in intel_encoder_end_picture (ctx=0x56ad15b0, 
profile=VAProfileVP8Version0_3, codec_state=0x56ae5cd0, 
hw_context=0x56bfff10) at ../../src/i965_encoder.c:1313
#9  0x74b8aa23 in i965_EndPicture (ctx=0x56ad15b0, 
context=33554433) at ../../src/i965_drv_video.c:3588
#10 0x776744ca in vaEndPicture (dpy=0x56ad1540, context=33554433) 
at ../../git/va/va.c:1285
#11 0x55df76d8 in vaapi_encode_issue (avctx=0x56b5f100, 
pic=0x56c64d20) at /home/mrt/video/libav/vp8/libavcodec/vaapi_encode.c:387
#12 0x55df7f29 in vaapi_encode_step (avctx=0x56b5f100, 
target=0x56c64d20) at 
/home/mrt/video/libav/vp8/libavcodec/vaapi_encode.c:608
#13 0x55df8be9 in ff_vaapi_encode2 (avctx=0x56b5f100, 
pkt=0x56b19380, input_image=0x56b188c0, got_packet=0x7fffdcfc) at 
/home/mrt/video/libav/vp8/libavcodec/vaapi_encode.c:895
#14 0x557a409b in avcodec_encode_video2 (avctx=0x56b5f100, 
avpkt=0x56b19380, frame=0x56b188c0, got_packet_ptr=0x7fffdcfc) at 
/home/mrt/video/libav/vp8/libavcodec/encode.c:231
#15 0x557a4280 in do_encode (avctx=0x56b5f100, 
frame=0x56b188c0, got_packet=0x7fffdcfc) at 
/home/mrt/video/libav/vp8/libavcodec/encode.c:278
#16 0x557a444a in avcodec_send_frame (avctx=0x56b5f100, 
frame=0x56b188c0) at /home/mrt/video/libav/vp8/libavcodec/encode.c:327
#17 0x555c3c5c in do_video_out (of=0x56c61ba0, ost=0x56c61680, 
in_picture=0x56b188c0, frame_size=0x7fffe1cc) at 
/home/mrt/video/libav/vp8/avconv.c:579
#18 0x555c43ce in poll_filter (ost=0x56c61680) 

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-11 Thread Xiang, Haihao

Hi Mark,

Can you reproduce the issue you mentioned below? If yes, I would like to fix it
in the new version of the patch series. 

Thanks
Haihao

> > 
> > On Tue, Jan 10, 2017 at 4:21 PM, Mark Thompson  wrote:
> > > On 10/01/17 22:02, Sean V Kelley wrote:
> > > > From: "Xiang, Haihao" 
> > > > 
> > > > Currently only one temporal layer is supported
> > > > 
> > > > Signed-off-by: Xiang, Haihao 
> > > > Reviewed-by: Sean V Kelley 
> > > > ---
> > > >   src/Makefile.am        |    3 +
> > > >   src/gen8_encoder_vp8.c |  140 +
> > > >   src/gen8_mfc.c         |    8 +-
> > > >   src/gen8_vme.c         |    5 +
> > > >   src/i965_defines.h     |   10 +
> > > >   src/i965_encoder.c     |    2 +
> > > >   src/i965_encoder_vp8.c | 6697
> > > 
> > > >   src/i965_encoder_vp8.h | 2643 +++
> > > >   8 files changed, 9507 insertions(+), 1 deletion(-)
> > > 
> > > I had a go with this on Kaby Lake.  In general, big win - looks like it
> > > can
> > > be under half the bitrate at comparable quality (though it was pretty
> > > terrible before...).
> > > 
> > > However, the rate control seems to do odd things at low bitrate relative
> > > to
> > > the frame size?  I can get GPU hangs and wildly varying output bitrate
> > > with
> > > it, though it seems ok at high bitrate.
> > That's a concern.  Please report the If it really is a GPU hang, I need
> > the error report for the DRM card0 log.
> > 
> > cat /sys/class/drm/card0/error
> > 
> > Please rerun and capture the DRM (i915) card0 error log.
> >  
> > >  
> > > I had a look around the rate control and found two minor issues in the RC
> > > configuration, though I don't think either of them are relevant to my
> > > problem (see below).  I can try to make a reproducer if this is not
> > > already
> > > known?
> > > 
> > Please do attempt to reproduce.  That's why I've put the patches out here to
> > test.
> 
> Thanks for testing the patch, could you detail the steps to reproduce this
> issue?
> 
> 
> > 
> > Thanks,
> > 
> > Sean
> >  
> > >  Thanks,
> > > 
> > > - Mark
> > > 
> > > 
> > > > ...
> > > > +
> > > > +static void
> > > > +i965_encoder_vp8_get_misc_parameters(VADriverContextP ctx,
> > > > +                                     struct encode_state *encode_state,
> > > > +                                     struct intel_encoder_context
> > > *encoder_context)
> > > > +{
> > > > +    struct i965_encoder_vp8_context *vp8_context = encoder_context-
> > > > vme_context;
> > > > +
> > > > +    if (vp8_context->internal_rate_mode == I965_BRC_CQP) {
> > > > +        vp8_context->init_vbv_buffer_fullness_in_bit = 0;
> > > > +        vp8_context->vbv_buffer_size_in_bit = 0;
> > > > +        vp8_context->target_bit_rate = 0;
> > > > +        vp8_context->max_bit_rate = 0;
> > > > +        vp8_context->min_bit_rate = 0;
> > > > +        vp8_context->brc_need_reset = 0;
> > > > +    } else {
> > > > +        vp8_context->gop_size = encoder_context->brc.gop_size;
> > > > +
> > > > +        if (encoder_context->brc.need_reset) {
> > > > +            vp8_context->framerate = encoder_context->brc.framerate[0];
> > > > +            vp8_context->vbv_buffer_size_in_bit = encoder_context-
> > > > brc.hrd_buffer_size;
> > > > +            vp8_context->init_vbv_buffer_fullness_in_bit =
> > > encoder_context->brc.hrd_initial_buffer_fullness;
> > > > +            vp8_context->max_bit_rate = encoder_context-
> > > > brc.bits_per_second[0]; // currently only one layer is supported
> > > > +            vp8_context->brc_need_reset = (vp8_context->brc_initted &&
> > > encoder_context->brc.need_reset);
> > > > +
> > > > +            if (vp8_context->internal_rate_mode == I965_BRC_CBR) {
> > > > +                vp8_context->min_bit_rate = vp8_context->max_bit_rate;
> > > > +                vp8_context->target_bit_rate = vp8_context-
> > > > >max_bit_rate;
> > > > +            } else {
> > > > +                assert(vp8_context->internal_rate_mode ==
> > > > I965_BRC_VBR);
> > > > +                vp8_context->min_bit_rate = vp8_context->max_bit_rate *
> > > (2 * encoder_context->brc.target_percentage[0] - 100) / 100;
> > > 
> > > If target percentage is < 50 then (2 * encoder_context-
> > > > brc.target_percentage[0] - 100) is negative.  Since it's unsigned, you
> > > > end
> > > up with a garbage number in min_bit_rate.
> > That's a concern, also we may need to reconcile this with our handling for
> > VP9
> > encode.
> >  
> > >  
> > > > +                vp8_context->target_bit_rate = vp8_context-
> > > > >max_bit_rate
> > > * encoder_context->brc.target_percentage[0] / 100;
> > > > +            }
> > > > +        }
> > > > +    }
> > > > +
> > > > +    if (encoder_context->quality_level == ENCODER_LOW_QUALITY)
> > > > +        vp8_context->hme_16x_supported = 0;
> > > > +}
> > > > +
> > > > ...
> > > > +
> > > > +static void
> > > > +i965_encoder_vp8_vme_brc_init_reset_set_curbe(VADriverContextP ctx,
> > > > +             

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-10 Thread Xiang, Haihao

> 
> On Tue, Jan 10, 2017 at 4:21 PM, Mark Thompson  wrote:
> > On 10/01/17 22:02, Sean V Kelley wrote:
> > > From: "Xiang, Haihao" 
> > >
> > > Currently only one temporal layer is supported
> > >
> > > Signed-off-by: Xiang, Haihao 
> > > Reviewed-by: Sean V Kelley 
> > > ---
> > >  src/Makefile.am        |    3 +
> > >  src/gen8_encoder_vp8.c |  140 +
> > >  src/gen8_mfc.c         |    8 +-
> > >  src/gen8_vme.c         |    5 +
> > >  src/i965_defines.h     |   10 +
> > >  src/i965_encoder.c     |    2 +
> > >  src/i965_encoder_vp8.c | 6697
> > 
> > >  src/i965_encoder_vp8.h | 2643 +++
> > >  8 files changed, 9507 insertions(+), 1 deletion(-)
> > 
> > I had a go with this on Kaby Lake.  In general, big win - looks like it can
> > be under half the bitrate at comparable quality (though it was pretty
> > terrible before...).
> > 
> > However, the rate control seems to do odd things at low bitrate relative to
> > the frame size?  I can get GPU hangs and wildly varying output bitrate with
> > it, though it seems ok at high bitrate.
> That's a concern.  Please report the If it really is a GPU hang, I need
> the error report for the DRM card0 log.
> 
> cat /sys/class/drm/card0/error
> 
> Please rerun and capture the DRM (i915) card0 error log.
>  
> >  
> > I had a look around the rate control and found two minor issues in the RC
> > configuration, though I don't think either of them are relevant to my
> > problem (see below).  I can try to make a reproducer if this is not already
> > known?
> > 
> Please do attempt to reproduce.  That's why I've put the patches out here to
> test.

Thanks for testing the patch, could you detail the steps to reproduce this
issue?


> 
> Thanks,
> 
> Sean
>  
> >  Thanks,
> > 
> > - Mark
> > 
> > 
> > > ...
> > > +
> > > +static void
> > > +i965_encoder_vp8_get_misc_parameters(VADriverContextP ctx,
> > > +                                     struct encode_state *encode_state,
> > > +                                     struct intel_encoder_context
> > *encoder_context)
> > > +{
> > > +    struct i965_encoder_vp8_context *vp8_context = encoder_context-
> > >vme_context;
> > > +
> > > +    if (vp8_context->internal_rate_mode == I965_BRC_CQP) {
> > > +        vp8_context->init_vbv_buffer_fullness_in_bit = 0;
> > > +        vp8_context->vbv_buffer_size_in_bit = 0;
> > > +        vp8_context->target_bit_rate = 0;
> > > +        vp8_context->max_bit_rate = 0;
> > > +        vp8_context->min_bit_rate = 0;
> > > +        vp8_context->brc_need_reset = 0;
> > > +    } else {
> > > +        vp8_context->gop_size = encoder_context->brc.gop_size;
> > > +
> > > +        if (encoder_context->brc.need_reset) {
> > > +            vp8_context->framerate = encoder_context->brc.framerate[0];
> > > +            vp8_context->vbv_buffer_size_in_bit = encoder_context-
> > >brc.hrd_buffer_size;
> > > +            vp8_context->init_vbv_buffer_fullness_in_bit =
> > encoder_context->brc.hrd_initial_buffer_fullness;
> > > +            vp8_context->max_bit_rate = encoder_context-
> > >brc.bits_per_second[0]; // currently only one layer is supported
> > > +            vp8_context->brc_need_reset = (vp8_context->brc_initted &&
> > encoder_context->brc.need_reset);
> > > +
> > > +            if (vp8_context->internal_rate_mode == I965_BRC_CBR) {
> > > +                vp8_context->min_bit_rate = vp8_context->max_bit_rate;
> > > +                vp8_context->target_bit_rate = vp8_context->max_bit_rate;
> > > +            } else {
> > > +                assert(vp8_context->internal_rate_mode == I965_BRC_VBR);
> > > +                vp8_context->min_bit_rate = vp8_context->max_bit_rate *
> > (2 * encoder_context->brc.target_percentage[0] - 100) / 100;
> > 
> > If target percentage is < 50 then (2 * encoder_context-
> > >brc.target_percentage[0] - 100) is negative.  Since it's unsigned, you end
> > up with a garbage number in min_bit_rate.
> That's a concern, also we may need to reconcile this with our handling for VP9
> encode.
>  
> >  
> > > +                vp8_context->target_bit_rate = vp8_context->max_bit_rate
> > * encoder_context->brc.target_percentage[0] / 100;
> > > +            }
> > > +        }
> > > +    }
> > > +
> > > +    if (encoder_context->quality_level == ENCODER_LOW_QUALITY)
> > > +        vp8_context->hme_16x_supported = 0;
> > > +}
> > > +
> > > ...
> > > +
> > > +static void
> > > +i965_encoder_vp8_vme_brc_init_reset_set_curbe(VADriverContextP ctx,
> > > +                                              struct encode_state
> > *encode_state,
> > > +                                              struct
> > intel_encoder_context *encoder_context,
> > > +                                              struct i965_gpe_context
> > *gpe_context)
> > > +{
> > > +    struct i965_encoder_vp8_context *vp8_context = encoder_context-
> > >vme_context;
> > > +    VAEncPictureParameterBufferVP8 *pic_param =
> > (VAEncPictureParameterBufferVP8 

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-10 Thread Sean V Kelley
On Tue, Jan 10, 2017 at 4:21 PM, Mark Thompson  wrote:

> On 10/01/17 22:02, Sean V Kelley wrote:
> > From: "Xiang, Haihao" 
> >
> > Currently only one temporal layer is supported
> >
> > Signed-off-by: Xiang, Haihao 
> > Reviewed-by: Sean V Kelley 
> > ---
> >  src/Makefile.am|3 +
> >  src/gen8_encoder_vp8.c |  140 +
> >  src/gen8_mfc.c |8 +-
> >  src/gen8_vme.c |5 +
> >  src/i965_defines.h |   10 +
> >  src/i965_encoder.c |2 +
> >  src/i965_encoder_vp8.c | 6697 ++
> ++
> >  src/i965_encoder_vp8.h | 2643 +++
> >  8 files changed, 9507 insertions(+), 1 deletion(-)
>
> I had a go with this on Kaby Lake.  In general, big win - looks like it
> can be under half the bitrate at comparable quality (though it was pretty
> terrible before...).
>
> However, the rate control seems to do odd things at low bitrate relative
> to the frame size?  I can get GPU hangs and wildly varying output bitrate
> with it, though it seems ok at high bitrate.
>

That's a concern.  Please report the If it really is a GPU hang, I need the
error report for the DRM card0 log.

cat /sys/class/drm/card0/error

Please rerun and capture the DRM (i915) card0 error log.


>
> I had a look around the rate control and found two minor issues in the RC
> configuration, though I don't think either of them are relevant to my
> problem (see below).  I can try to make a reproducer if this is not already
> known?
>
> Please do attempt to reproduce.  That's why I've put the patches out here
to test.

Thanks,

Sean


> Thanks,
>
> - Mark
>
>
> > ...
> > +
> > +static void
> > +i965_encoder_vp8_get_misc_parameters(VADriverContextP ctx,
> > + struct encode_state *encode_state,
> > + struct intel_encoder_context
> *encoder_context)
> > +{
> > +struct i965_encoder_vp8_context *vp8_context =
> encoder_context->vme_context;
> > +
> > +if (vp8_context->internal_rate_mode == I965_BRC_CQP) {
> > +vp8_context->init_vbv_buffer_fullness_in_bit = 0;
> > +vp8_context->vbv_buffer_size_in_bit = 0;
> > +vp8_context->target_bit_rate = 0;
> > +vp8_context->max_bit_rate = 0;
> > +vp8_context->min_bit_rate = 0;
> > +vp8_context->brc_need_reset = 0;
> > +} else {
> > +vp8_context->gop_size = encoder_context->brc.gop_size;
> > +
> > +if (encoder_context->brc.need_reset) {
> > +vp8_context->framerate = encoder_context->brc.framerate[0];
> > +vp8_context->vbv_buffer_size_in_bit =
> encoder_context->brc.hrd_buffer_size;
> > +vp8_context->init_vbv_buffer_fullness_in_bit =
> encoder_context->brc.hrd_initial_buffer_fullness;
> > +vp8_context->max_bit_rate = 
> > encoder_context->brc.bits_per_second[0];
> // currently only one layer is supported
> > +vp8_context->brc_need_reset = (vp8_context->brc_initted &&
> encoder_context->brc.need_reset);
> > +
> > +if (vp8_context->internal_rate_mode == I965_BRC_CBR) {
> > +vp8_context->min_bit_rate = vp8_context->max_bit_rate;
> > +vp8_context->target_bit_rate =
> vp8_context->max_bit_rate;
> > +} else {
> > +assert(vp8_context->internal_rate_mode ==
> I965_BRC_VBR);
> > +vp8_context->min_bit_rate = vp8_context->max_bit_rate *
> (2 * encoder_context->brc.target_percentage[0] - 100) / 100;
>
> If target percentage is < 50 then (2 * 
> encoder_context->brc.target_percentage[0]
> - 100) is negative.  Since it's unsigned, you end up with a garbage number
> in min_bit_rate.
>

That's a concern, also we may need to reconcile this with our handling for
VP9 encode.


>
> > +vp8_context->target_bit_rate =
> vp8_context->max_bit_rate * encoder_context->brc.target_percentage[0] /
> 100;
> > +}
> > +}
> > +}
> > +
> > +if (encoder_context->quality_level == ENCODER_LOW_QUALITY)
> > +vp8_context->hme_16x_supported = 0;
> > +}
> > +
> > ...
> > +
> > +static void
> > +i965_encoder_vp8_vme_brc_init_reset_set_curbe(VADriverContextP ctx,
> > +  struct encode_state
> *encode_state,
> > +  struct
> intel_encoder_context *encoder_context,
> > +  struct i965_gpe_context
> *gpe_context)
> > +{
> > +struct i965_encoder_vp8_context *vp8_context =
> encoder_context->vme_context;
> > +VAEncPictureParameterBufferVP8 *pic_param = 
> > (VAEncPictureParameterBufferVP8
> *)encode_state->pic_param_ext->buffer;
> > +struct vp8_brc_init_reset_curbe_data *pcmd =
> i965_gpe_context_map_curbe(gpe_context);
> > +double input_bits_per_frame, bps_ratio;
> > +
> > +memset(pcmd, 0, sizeof(*pcmd));
> > +
> > +pcmd->dw0.profile_level_max_frame = vp8_context->frame_width *
> vp8_context->

Re: [Libva] [PATCH 2/4] Set the pipeline to use the new VP8 encoding shaders on BSW

2017-01-10 Thread Mark Thompson
On 10/01/17 22:02, Sean V Kelley wrote:
> From: "Xiang, Haihao" 
> 
> Currently only one temporal layer is supported
> 
> Signed-off-by: Xiang, Haihao 
> Reviewed-by: Sean V Kelley 
> ---
>  src/Makefile.am|3 +
>  src/gen8_encoder_vp8.c |  140 +
>  src/gen8_mfc.c |8 +-
>  src/gen8_vme.c |5 +
>  src/i965_defines.h |   10 +
>  src/i965_encoder.c |2 +
>  src/i965_encoder_vp8.c | 6697 
> 
>  src/i965_encoder_vp8.h | 2643 +++
>  8 files changed, 9507 insertions(+), 1 deletion(-)

I had a go with this on Kaby Lake.  In general, big win - looks like it can be 
under half the bitrate at comparable quality (though it was pretty terrible 
before...).

However, the rate control seems to do odd things at low bitrate relative to the 
frame size?  I can get GPU hangs and wildly varying output bitrate with it, 
though it seems ok at high bitrate.

I had a look around the rate control and found two minor issues in the RC 
configuration, though I don't think either of them are relevant to my problem 
(see below).  I can try to make a reproducer if this is not already known?

Thanks,

- Mark


> ...
> +
> +static void
> +i965_encoder_vp8_get_misc_parameters(VADriverContextP ctx,
> + struct encode_state *encode_state,
> + struct intel_encoder_context 
> *encoder_context)
> +{
> +struct i965_encoder_vp8_context *vp8_context = 
> encoder_context->vme_context;
> +
> +if (vp8_context->internal_rate_mode == I965_BRC_CQP) {
> +vp8_context->init_vbv_buffer_fullness_in_bit = 0;
> +vp8_context->vbv_buffer_size_in_bit = 0;
> +vp8_context->target_bit_rate = 0;
> +vp8_context->max_bit_rate = 0;
> +vp8_context->min_bit_rate = 0;
> +vp8_context->brc_need_reset = 0;
> +} else {
> +vp8_context->gop_size = encoder_context->brc.gop_size;
> +
> +if (encoder_context->brc.need_reset) {
> +vp8_context->framerate = encoder_context->brc.framerate[0];
> +vp8_context->vbv_buffer_size_in_bit = 
> encoder_context->brc.hrd_buffer_size;
> +vp8_context->init_vbv_buffer_fullness_in_bit = 
> encoder_context->brc.hrd_initial_buffer_fullness;
> +vp8_context->max_bit_rate = 
> encoder_context->brc.bits_per_second[0]; // currently only one layer is 
> supported
> +vp8_context->brc_need_reset = (vp8_context->brc_initted && 
> encoder_context->brc.need_reset);
> +
> +if (vp8_context->internal_rate_mode == I965_BRC_CBR) {
> +vp8_context->min_bit_rate = vp8_context->max_bit_rate;
> +vp8_context->target_bit_rate = vp8_context->max_bit_rate;
> +} else {
> +assert(vp8_context->internal_rate_mode == I965_BRC_VBR);
> +vp8_context->min_bit_rate = vp8_context->max_bit_rate * (2 * 
> encoder_context->brc.target_percentage[0] - 100) / 100;

If target percentage is < 50 then (2 * 
encoder_context->brc.target_percentage[0] - 100) is negative.  Since it's 
unsigned, you end up with a garbage number in min_bit_rate.

> +vp8_context->target_bit_rate = vp8_context->max_bit_rate * 
> encoder_context->brc.target_percentage[0] / 100;
> +}
> +}
> +}
> +
> +if (encoder_context->quality_level == ENCODER_LOW_QUALITY)
> +vp8_context->hme_16x_supported = 0;
> +}
> +
> ...
> +
> +static void
> +i965_encoder_vp8_vme_brc_init_reset_set_curbe(VADriverContextP ctx,
> +  struct encode_state 
> *encode_state,
> +  struct intel_encoder_context 
> *encoder_context,
> +  struct i965_gpe_context 
> *gpe_context)
> +{
> +struct i965_encoder_vp8_context *vp8_context = 
> encoder_context->vme_context;
> +VAEncPictureParameterBufferVP8 *pic_param = 
> (VAEncPictureParameterBufferVP8 *)encode_state->pic_param_ext->buffer;
> +struct vp8_brc_init_reset_curbe_data *pcmd = 
> i965_gpe_context_map_curbe(gpe_context);
> +double input_bits_per_frame, bps_ratio;
> +
> +memset(pcmd, 0, sizeof(*pcmd));
> +
> +pcmd->dw0.profile_level_max_frame = vp8_context->frame_width * 
> vp8_context->frame_height;
> +pcmd->dw1.init_buf_full_in_bits = 
> vp8_context->init_vbv_buffer_fullness_in_bit;
> +pcmd->dw2.buf_size_in_bits = vp8_context->vbv_buffer_size_in_bit;
> +pcmd->dw3.average_bitrate = ALIGN(vp8_context->target_bit_rate, 
> VP8_BRC_KBPS) / VP8_BRC_KBPS * VP8_BRC_KBPS;
> +pcmd->dw4.max_bitrate = ALIGN(vp8_context->max_bit_rate, VP8_BRC_KBPS) / 
> VP8_BRC_KBPS * VP8_BRC_KBPS;

VP8_BRC_KBPS is 1000 which is not a power of two, so the ALIGN macro isn't 
doing anything sensible here.

> +pcmd->dw6.frame_rate_m = vp8_context->framerate.num;
> +pcmd->dw7.frame_rate_d = vp8_context->framerate.den;